WORLD INTELLECTUAL PROPERTY ORGANIZATION 
International Bureau 




PCT 

INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) 



(51) International Patent Classification 6 : 
C07H 21/04, C12Q 1/68 



Al 



(11) International Publication Number: 
(43) International Publication Date: 



WO 95/12607 

11 May 1995 (11.05.95) 



(21) International Application Number: PCT/US94/ 12632 

(22) International Filing Date: 2 November 1994 (02. 11.94) 



(30) Priority Data: 
08/145,145 
08/216,538 



3 November 1993 (03.1 1.93) US 
23 March 1994(23.03.94) US 



(71) Applicant: MOLECULAR TOOL, INC. [US/US]; 5210 East- 

ern Avenue, Baltimore, MD 21224 (US). 

(72) Inventors: GOELET, Philip; 301 Western Run Road, Cock- 

eysville, MD (US). KNAPP, Michael, R.; 2630 N. Calvert 
Street, Baltimore, MD (US). 

(74) Agents: AUERBACH, Jeffrey, I. et al.; Howrey & Simon, 1299 
Pennsylvania Avenue, NW, Washington, DC 20004 (US). 



(81) Designated States: AU, CA, JP, European patent (AT, BE, 
CH, DE, DK, ES, FR, GB, GR, IE, IT. LU, MC, NL, PT, 
SE). 



Published 

With international search report. 



(54) Title: SINGLE NUCLEOTIDE POLYMORPHISMS AND THEIR USE IN GENETIC ANALYSIS 
(57) Abstract 



Molecules and methods suitable for identifying polymorphic sites in the genome of a plant or animal. The identification of such sites 
is useful in determining identity, ancestry, predisposition to genetic disease, the presence or absence of a desired trait, etc. 



BEST AVAILABLE COHY 



FOR THE PURPOSES OF INFORMATION ONLY 

Codes used to identify States party to the PCT on the front pages of pamphlets publishing international 
applications under the PCT. 



AT 


Austria 


GB 


United Kingdom 


MR 


Mauritania 


AU 


Australia 


GE 


Georgia 


MW 


Malawi 


BB 


Barbados 


GN 


Guinea 


NE 


Niger 


BE 


Belgium 


GR 


Greece 


NL 


Netherlands 


BF 


Burkina Faso 


HLf 


Hungary 


NO 


Norway 


BG 


Bulgaria 


IE 


Ireland 


NZ 


New Zealand 


BJ 


Benin 


IT 


Italy 


PL 


Poland 


BR 


Brazil 


JP 


Japan 


PT 


Portugal 


BY 


Belarus 


KE 


Kenya 


RO 


Romania 


CA 


Canada 


KG 


Kyrgystan 


RU 


Russian Federation 


CF 


Central African Republic 


KP 


Democratic People's Republic 


SD 


Sudan 


CG 


Congo 




of Korea 


SE 


Sweden 


CH 


Switzerland 


KR 


Republic of Korea 


SI 


Slovenia 


CI 


Cote d'lvoirc 


KZ 


Kazakhstan 


SK 


Slovakia 


CM 


Cameroon 


LI 


Liechtenstein 


SN 


Senegal 


CN 


China 


LK 


Sri Lanka 


TD 


Chad 


CS 


Czechoslovakia 


LU 


Luxembourg 


TG 


Togo 


CZ 


Czech Republic 


LV 


Latvia 


TJ 


Tajikistan 


DE 


Germany 


MC 


Monaco 


TT 


Trinidad and Tobago 


DK 


Denmark 


MD 


Republic of Moldova 


UA 


Ukraine 


ES 


Spain 


MG 


Madagascar 


US 


United States of America 


Ft 


Finland 


ML 


Mali 


uz 


Uzbekistan 


FR 


France 


MX 


Mongolia 


VN 


Viei Nam 


GA 


Gabon 











WO 95/12607 



- 1 - 



PCIYUS94/12632 



TITLE OF THE INVENTION 



SINGLE NUCLEOTIDE POLYMORPHISMS 
AND THEIR USE IN GENETIC ANALYSIS 



FIELD OF THE INVENTION 

1 0 The present invention is in the field of recombinant DNA 

technology. More specifically, the invention is directed to 
molecules and methods suitable for identifying single nucleotide 
polymorphisms in the genome of an animal, especially a horse or 
a human, and using such sites to analyze identity, ancestry or 

1 5 genetic traits. 

CROSS-REFERENCE TO RELATED APPLICATIONS 

This application is a continuation-in-part of U.S. Patent 
Application Serial No. 08/145,145 (filed November 3, 1993). 

20 BACKGROUND OF THE INVENTION 



The capacity to genotype an animal, plant or microbe is of 
fundamental importance to forensic science, medicine and 
epidemiology and public health, and to the breeding and 
25 exhibition of animals. Such a capacity is needed, for example, to 
determine the identity of the causative agent of an infectious 
disease, to determine whether two individuals are related, or to 
establish whether a particular animal such as a horse is a 
thoroughbred. 

30 The analysis of identity and parentage, along with the 

capacity to diagnose disease is also of central concern to human, 
animal and plant genetic studies, particularly forensic or 
paternity evaluations, and in the evaluation of an individual's 
risk of genetic disease. Such goals have been pursued by 
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analyzing variations in DNA sequences that distinguish the DNA 
of one individual from another. 

If such a variation alters the lengths of the fragments that 
are generated by restriction endonuclease cleavage, the 
5 variations are referred to as restriction fragment length 
polymorphisms ("RFLPs"). RFLPs have been widely used in human 
and animal genetic analyses (Glassberg, J., UK patent Application 
2135774; Skolnick, M.H. et aL Cytoaen. Cell Genet. 32:58-67 
(1982); Botstein, D. et aL . Ann. J. Hum. Genet. 32:31 4-331 

10 (1980); Fischer, S.G et at. (PCT Application WO90/13668); Uhlen, 
M., PCT Application WO90/1 1369)). Where a heritable trait can 
be linked to a particular RFLP, the presence of the RFLP in a 
target animal can be used to predict the likelihood that the 
animal will also exhibit the trait. Statistical methods have 

1 5 been developed to permit the multilocus analysis of RFLPs such 
that complex traits that are dependent upon multiple alleles can 
be mapped (Lander, S. et aL, Proc. Natl. Acad. Sci. (U.S.A.) 
83:7353-7357 (1986); Lander, S. et gl-, Proc. Natl. Acad. Sci, 
(U.S.A.) 84:2363-2367 (1987); Donis-Keller, H. et aL Cell 

20 51:319-337 (1987); Lander, S. et aL , Genetics 121:185-199 
(1989), all herein incorporated by reference). Such methods can 
be used to develop a genetic map, as well as to develop plants^ or 
animals having more desirable traits (Donis-Keller, H. et aL Cell 
£1:31 9-337 (1987); Lander, S. et aL . Genetics 121 :185-199 

25 (1989)). 

In some cases, the DNA sequence variations are in regions 
of the genome that are characterized by short tandem repeats 
(STRs) that include tandem di- or tri-nucleotide repeated motifs 
of nucleotides. These tandem repeats are also referred to as 

30 "variable number tandem repeat" ("VNTR") polymorphisms. 
VNTRs have been used in identity and paternity analysis (Weber, 
J.L, U.S. Patent 5,075,217; Armour, J.A.L. et aL . FEBS Lett. 
307:113-115 (1992); Jones, L. et aL . Eur. J. Haematol. 39:1 44- 
147 (1987); Horn, G.T. et aL . PCT Application WO91/14003; 

35 Jeffreys, A. J., European Patent Application 370,719; Jeffreys, 
A.J., U.S. Patent 5,175,082); Jeffreys. A.J. et aL . Amer. J. Hum. 
Genet. 39:11-24 (1986); Jeffreys. A.J. et aL . Nature 316 :76-79 
(1985); Gray, I.C. et aL . Proc. R. Acad. Soc. Lond. 243 :241 -253 
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(1991); Moore, S.S. gt aL Genomics 1 0:654-660 (1991); 
Jeffreys, A.J. et aL Anim. Genet. 18:1-15 (1987); Hillel, J. et aL 
Anim. Genet. 20:145-155 (1989); Hillel, J. et aL Genet. 124:783- 
789 (1990)) and are now being used in a large number of genetic 
5 mapping studies. 

A third class of DNA sequence variation results from 
single nucleotide polymorphisms (SNPs) that exist between 
individuals of the same species. Such polymorphisms are far 
more frequent than RFLPs, STRs and VNTRs. In some cases, such 

1 0 polymorphisms comprise mutations that are the determinative 
characteristic in a genetic disease. Indeed, such mutations may 
affect a single nucleotide in a protein-encoding gene in a manner 
sufficient to actually cause the disease (i.e. hemophilia, sickle- 
cell anemia, etc.). In many cases, these SNPs are in noncoding 

1 5 regions of a genome. Despite the central importance of such 
polymorphisms in modern genetics, no practical method has been 
developed that permits the use of highly parallel analysis of 
many SNP alleles in two or more individuals in genetic analysis. 
The present invention provides such an improved method. 

20 Indeed, the present invention provides methods and gene 
sequences that permit the genetic analysis of identity and 
parentage, and the diagnosis of disease by discerning the 
variation of single nucleotide polymorphisms. 

25 SUMMARY OF THE INVENTION 

The present invention is directed to molecules that 
comprise single nucleotide polymorphisms (SNPs) that are 
present in mammalian DNA, and in particular, to equine and 

30 human genomic DNA polymorphisms. The invention is directed to 
methods for (i) identifying novel single nucleotide 
polymorphisms (ii) methods for the repeated analysis and 
testing of these SNPs in different samples and (iii) methods for 
exploiting the existence of such sites in the genetic analysis of 

35 single animals and populations of animals. 

The analysis (genotyping) of such sites is useful in 
determining identity, ancestry, predisposition to genetic 
disease, the presence or absence of a desired trait, etc. In 
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detail, the invention provides a nucleic acid primer molecule 
having a polynucleotide sequence complementary to an 
"invariant" nucleotide sequence of a genomic DNA segment of a 
mammal, the genomic segment being located immediately 3'- 
5 distal to a single nucleotide polymorphic site, X, of a single 
nucleotide polymorphic allele of the mammal; and wherein 
template-dependent extension of the nucleic acid primer 
molecule by a single nucleotide extends the primer molecule by a 
single nucleotide, the single nucleotide being complementary to 
1 0 the nucleotide, X, of the single nucleotide polymorphic allele. 
The invention particularly concerns the embodiment wherein the 
mammal is selected from the group consisting of humans, non- 
human primates, dogs, cats, cattle, sheep, and horses. 

The invention particularly concerns the embodiments 

1 5 wherein the mammal is a horse, and wherein the nucleic acid 

molecule has a nucleotide sequence selected from the group 
consisting of SEQ ID NO:(2n+l) [refer to Table 1], wherein n is an 
integer selected from the group consisting of 0 through 35, or 
wherein the sequence of the immediately 3'-distal segment 

2 0 includes a sequence selected from the group consisting of SEQ ID 

NO:(2n+2), wherein n is an integer selected from the group 
consisting of 0 through 35. 

The invention also provides a nucleic acid molecule having 
a sequence complementary to a sequence selected from the group 
2 5 consisting of SEQ ID NO:1 through SEQ ID NO:72. The invention 
also provides a set of at least two of such nucleic acid 
molecules. 

The invention also provides a set of at least two nucleic 
acid molecules, wherein at least one of the nucleic acid 
30 molecules has a sequence complementary to a sequence selected 
from the group consisting of SEQ ID NO:1 through SEQ ID NO:72. 

The invention also provides a method for determining the 
extent of genetic similarity between DNA of a target horse and 
DNA of a reference horse, which comprises the steps: 
35 A) determining, for a single nucleotide polymorphism of 
the target horse, and for a corresponding single 
nucleotide polymorphism of the reference horse, whe- 
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ther the polymorphisms contain the same single nucl- 
eotide at their respective polymorphic sites; and 
B) using the comparison to determine the extent of genetic 
similarity between the target horse and the reference 
5 horse. 

The invention also concerns the embodiment of such 
method wherein the polymorphic sites are flanked by (1) an 
immediately 5'-proximal sequence selected from the group 
consisting of SEQ ID NO:(2n+1), and (2) an immediately 3'-distal 
1 0 sequence selected from the group consisting of SEQ ID NO:(2n+2); 
wherein n is an integer selected from the group consisting of 0 
through 35. 

The invention particularly concerns the embodiment 
wherein, in step A, the determination is accomplished by a 

1 5 method having the sub-steps: 

(a) incubating a sample of nucleic acid containing the 
single nucleotide polymorphism of the target horse, or 
the single nucleotide polymorphism of the reference 
horse, in the presence of a nucleic acid primer and at 

20 least one dideoxynucleotide derivative, under 

conditions sufficient to permit a polymerase mediated, 
template-dependent extension of the primer, the 
extension causing the incorporation of a single 
dideoxynucleotide to the S'-terminus of the primer, the 

2 5 single dideoxynucleotide being complementary to the 

single nucleotide of the polymorphic site of the 
polymorphism; 

(b) permitting the template-dependent extension of the 
primer molecule, and the incorporation of the single 

3 0 dideoxynucleotide; and 

(c) determining the identity of the nucleotide incorporated 
into the polymorphic site, the identified nucleotide 
being complimentary to the nucleotide of the 
polymorphic site. 

35 The invention further concerns the embodiment of the 

above methods wherein the template-dependent extension of the 
primer is conducted in the presence of at least two 
dideoxynucleotide triphosphate derivatives selected from the 
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group consisting of ddATP, ddTTP, ddCTP and ddGTP, but in the 
absence of dATP, dTTP, dCTP and dGTP. 

The invention particularly concerns the sub-embodiments 
of the above methods wherein the nucleic acid of the sample is 
5 amplified in vitro prior to the incubation, and/or the primer is 
immobilized to a solid support. 

The invention further concerns the embodiment of the 
above methods wherein a non-invasive swab is used to collect 
the sample of DNA. 

1 o The invention further provides a method for determining 

the probability that a target horse will have a particular trait, 
which comprises the steps: 

A) determining the identity of a single nucleotide present 
at a polymorphic site of an equine single nucleotide 

15 polymorphism, and being present in more than 51% of a 

set of reference horses; 

B) determining whether a single nucleotide present at a 
polymorphic site of a corresponding single nucleotide 
polymorphism of the target horse has the same identity 

2 0 as the single nucleotide present at the polymorphic site 

of the 51% of reference horses exhibiting the trait; 

C) using the determination of step B to establish the 
probability that the target horse will have the 
particular trait. 

2 5 The invention further provides a method for creating a 

genetic map of unique sequence equine polymorphisms which 
comprises the steps: 

A) identifying at least one pair of inter-breeding reference 
horses, wherein each of the pairs of horses is 
30 characterized by having a first and a second reference 

horse, 

the first reference horse having: 

two alleles (i) and (ii), the alleles each being 
single nucleotide polymorphic alleles having a 
35 single nucleotide polymorphic site; 

the second reference horse having: 

a corresponding allele (i') to the allele (i) of the 
first reference horse, wherein the allele (i') has a 
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single nucleotide polymorphic site, and wherein 
the single nucleotide present at the polymorphic 
site of the allele (i 1 ) differs from the single 
nucleotide present at the polymorphic site of the 
5 allele (i) of the first reference horse, and 

B) identifying in a progeny of at least one of the pairs of 
inter-breeding reference horses the single nucleotide 
present at a single nucleotide polymorphic site of a 
corresponding allele of the alleles (i) and (i 1 ), and the 

10 single nucleotide present at a single nucleotide 

polymorphic site of a corresponding allele of the alleles 
(ii) and (ii'); and 

C) determining the extent of genetic linkage between the 
alleles (i) and (ii), to thereby create the genetic map. 

1 5 The invention further provides a method for predicting 

whether a target horse will exhibit a predetermined trait which 
comprises the steps: 

A) identifying one or more alleles associated with the 
trait, each allele being a single nucleotide polymorphic 

20 allele having a single nucleotide polymorphic site; 

B) determining for each of the single nucleotide 
polymorphic alleles, a nucleotide present at the allele's 
polymorphic site in a reference horse exhibiting the 
trait, to thereby define a set of single nucleotides at a 

2 5 set of polymorphic sites that are present in a reference 

horse exhibiting the trait; 

C) determining the identity of single nucleotides present at 
corresponding single nucleotide polymorphic alleles of 
the target horse; and 

30 D) comparing the identity of the single nucleotides present 
at the polymorphic sites of the polymorphisms of the 
reference animal with the single nucleotides present at 
the corresponding single nucleotide polymorphic alleles 
of the target horse. 
35 The invention further provides a method for identifying a 

single nucleotide polymorphic site which comprises: 

A) isolating a fragment of genomic DNA of a reference 
organism; 
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B) sequencing the fragment of DNA to thereby determine 
the nucleotide sequence of a segment of the fragment, 
the segment being of a length sufficient to define the 
nucleotide sequence of a pair of oligonucleotide primers 
capable of mediating the specific amplification of the 
fragment; 

C) using the oligonucleotide primers to mediate the 
specific amplification of DNA obtained from a plurality 
of other organisms of the same species as the reference 
organism; and 

D) determining the nucleotide sequences of the amplified 
DNA molecules of step C, and comparing the sequence of 
the amplified molecules with the sequence of the 
fragment of the reference organism to thereby identify 
a single nucleotide polymorphic site. 

The invention also includes a method for interrogating a 
polymorphic region of a human single nucleotide 
polymorphism of a target human, the method comprising: 

A) selecting a known human single nucleotide 
polymorphism for interrogation; 

B) identifying the sequence of at least one oligonucleotide 
that flanks the selected single nucleotide 
polymorphism; the identified sequence being of a length 
sufficient to permit the identification of primers 
capable of being used to effect the specific 
amplification of the flanking oligonucleotide and the 
polymorphism; 

C) using the primers to effect the amplification of the 
flanking oligonucleotide and the polymorphism of the 
single nucleotide polymorphism of the target human; and 

D) interrogating the single nucleotide polymorphism of the 
amplified polymorphism by genetic bit analysis. 

BRIEF DESCRIPTION OF THE FIGURES 

Figure 1 illustrates the preferred method for cloning 
random genomic fragments. Genomic DNA us size fractionated, 
and then introduced into a plasmid vector, in order to obtain 
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random clones. PCR primers are designed, and used to sequence 
the inserted genomic sequences. 

Figure 2 illustrates the data generated by preferred 
method for identifying new polymorphic sequences which is 
5 cycle sequencing of a random genomic fragment. 

Figure 3 illustrates the RFLP method for screening random 
clones for polymorphic sequences. After the initial optimization 
of PCR conditions (top panel), amplified material is cleaved with 
several restriction enzymes, and the resulting profiles are 
1 0 analyzed (middle panels). A population study is then performed 
to determine allelic frequencies. 

Figure 4 shows a graph of the probability that two 
individuals will have identical genotypes with given panels of 
genetic markers. The number of tests employed is plotted on the 
1 5 abscissa while the cumulative probability of non-identity is 
plotted on the ordinate. The horizontal line indicates 0.95 
probability of non-identity. Legend: o indicates the extrapolated 
prototype; x indicates 3 alleles (51%, 34%, 15%); triangle 
indicates 2 alleles (79%, 21%). 
20 Figure 5 shows a graph of the probability that given 

panels of 20 genetic markers will exclude a random alleged 
father in a paternity suit in which the mother is not in question. 
The number of tests employed is plotted on the abscissa while 
the cumulative probability of exclusion is plotted on the 
25 ordinate. The horizontal line indicates 0.95 probability of 
exclusion. The legend is as in Figure 4. 

Figure 6 uses the SNP identified in clone 177-2 to 
illustrate the organization of the sequences in Table 1. 

Figure 7 illustrates the preferred method for genotyping 
30 SNPs. The seven steps illustrate how GBA can be performed 
starting with a biological sample. 

Figures 8A and 8B illustrate how horse parentage data 
appears at the microtiter plate level. 
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DESCRIPTION OF THE PREFERRED EMBODIMENTS 

I. The Single Nucleotide Polymorphisms of the 
Present Invention and The Advantages of their Use 
in Genetic Analysis 

5 A. The Attributes of the Polymorphisms 

The particular gene sequences of interest to the present 
invention comprise "single nucleotide polymorphisms. 1 ' A 
"polymorphism" is a variation in the DNA sequence of some 
members of a species. The genomes of animals and plants 

1 0 naturally undergo spontaneous mutation in the course of their 
continuing evolution (Gusella, J.F., Ann. Rev. Biochem. 55:831 - 
854 (1986)). The majority of such mutations create 
polymorphisms. The mutated sequence and the initial sequence 
co-exist in the species' population. In some instances, such co- 

1 5 existence is in stable or quasi-stable equilibrium. In other 
instances, the mutation confers a survival or evolutionary 
advantage to the species, and accordingly, it may eventually (i.e. 
over evolutionary time) be incorporated into the DNA of every 
member of that species. 

20 A polymorphism is thus said to be "allelic," in that, due to 

the existence of the polymorphism, some members of a species 
may have the unmutated sequence (i.e. the original "allele") 
whereas other members may have a mutated sequence (i.e. the 
variant or mutant "allele"). In the simplest case, only one 

2 5 mutated sequence may exist, and the polymorphism is said to be 

diallelic. Diallelic polymorphisms are the most common and the 
preferred polymorphisms of the present invention. The 
occurrence of alternative mutations can give rise to trialleleic, 
etc. polymorphisms. An allele may be referred to by the 
30 nucleotide(s) that comprise the mutation. Thus, for example, in 
Table 1, clone 177-2 (SEQ ID NO:1 and SEQ ID NO:2) illustrates 
the sequence of one strand of a diallelic polymorphism in which 
one allele has a "C" and the other allele has a "T" at the 
polymorphic site. 

3 5 The present invention is directed to a particular class of 

allelic polymorphisms, and to their use in genotyping a plant or 
animal. Such allelic polymorphisms are referred to herein as 
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"single nucleotide polymorphisms," or "SNPs." "Single nucleotide 
polymorphisms" are defined by the following attributes. A 
central attribute of such a polymorphism is that it contains a 
polymorphic site, "X," most preferably occupied by a single 
5 nucleotide, which is the site of variation between allelic 
sequences. A second characteristic of an SNP is that its 
polymorphic site "X" is preferably preceded by and followed by 
"invariant" sequences of the allele. The polymorphic site of the 
SNP is thus said to lie "immediately" 3' to a "5'-proximal" 
10 invariant sequence, and "immediately" 5' to a "3'-distal" 
invariant sequence. Such sequences flank the polymorphic site. 

As used herein, a sequence is said to be an "invariant" 
sequence of an allele if the sequence does not vary in the 
population of the species, and if mapped, would map to a 

1 5 "corresponding" sequence of the same allele in the genome of 

every member of the species population. Two sequences are said 
to be "corresponding" sequences if they are analogs of one 
another obtained from different sources. The gene sequences 
that encode hemoglobin in two humans illustrate "corresponding" 
20 allelic sequences. The definition of "corresponding alleles" 
provided herein is intended to clarify, but not to alter, the 
meaning of that term as understood by those of ordinary skill in 
the art. Each row of Table 1 shows the identity of the 
nucleotide of the polymorphic site of "corresponding" equine 

2 5 alleles, as well as the invariant 5'-proximal and 3'-distal 

sequences that are also attributes of that SNP. "Corresponding 
alleles" are illustrated in Table 5 with regard to human alleles. 
Each row of Table 5 shows the identity of the nucleotide of the 
polymorphic site of "corresponding" human alleles, as well as 

30 the invariant 5'-proximal and 3'-distal sequences that are also 
attributes of that SNP. 

Since genomic DNA is double-stranded, each SNP can be 
defined in terms of either strand. Thus, for every SNP, one 
strand will contain an immediately 5'-proximal invariant 

35 sequence and the other will contain an immediately 3'-distal 
invariant sequence. In the preferred embodiment, wherein a 
SNP's polymorphic site, "X," is a single nucleotide, each strand of 
the double-stranded DNA cr the SNP will contain both an 
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immediately 5'-proximal invariant sequence and an immediately 
3'-distal invariant sequence. 

Although the preferred SNPs of the present invention 
involve a substitution of one nucleotide for another at the SNP's 
5 polymorphic site, SNPs can also be more complex, and may 
comprise a deletion of a nucleotide from, or an insertion of a 
nucleotide into, one of two corresponding sequences. For 
example, a particular gene sequence may contain an A in a 
particular polymorphic site in some animals, whereas in other 

1 0 animals a single or multiple base deletion might be present at 
that site. Although the preferred SNPs of the present invention 
have both an invariant proximal sequence and invariant distal 
sequence, SNPs may have only an invariant proximal or only an 
invariant distal sequence. 

15 Nucleic acid molecules having the a sequence 

complementary to that of an immediately 3'-distal invariant 
sequence of a SNP can, if extended in a "template-dependent" 
manner, form an extension product that would contain the SNP's 
polymorphic site. An preferred example of such a nucleic acid 

20 molecule is a nucleic acid molecule whose sequence is the same 
as that of a S'-proximal invariant sequence of the SNP. 
"Template-dependent" extension refers to the capacity of a 
polymerase to mediate the extension of a primer such that the 
extended sequence is complementary to the sequence of a 

25 nucleic acid template. A "primer" is a single-stranded 
oligonucleotide or a single-stranded polynucleotide that is 
capable of being extended by the covalent addition of a 
nucleotide in a "template-dependent" extension reaction. In 
order to possess such a capability, the primer must have a 3'- 

30 hydroxyl terminus, and be hybridized to a second nucleic acid 
molecule (i.e. the "template"). A primer is typically 11 bases or 
longer; most preferably, a primer is 20 bases, however, primers 
of shorter or greater length may suffice. A "polymerase" is an 
enzyme that is capable of incorporating nucleoside 

35 triphosphates to extend a 3'-hydroxyl group of a nucleic acid 
molecule, if that molecule has hybridized to a suitable template 
nucleic acid molecule. Polymerase enzymes are discussed in 
Watson, J.D., In: Molecul ar Biology of the Gene . 3rd Ed., W.A. 
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Benjamin, Inc., Menlo Park, CA (1977), which reference is 
incorporated herein by reference, and similar texts. Other 
polymerases such as the large proteolytic fragment of the DNA 
polymerase I of the bacterium E. coli . commonly known as 
5 "Klenow" polymerase, E. coli DNA polymerase I, and 
bacteriophage 17 DNA polymerase, may also be used to perform 
the method described herein. Nucleic acids having the same 
sequence as that of the immediately 3' distal invariant sequence 
of a SNP can be ligated in a template dependent fashion to a 
1 0 primer that has the same sequence as that of the immediately 5' 
proximal sequence that has been extended by one nucleotide in a 
template dependent fashion. 

B. The Advantages of Using SNPs in Genetic 
15 Analysis 

The single nucleotide polymorphic sites of the present 
invention can be used to analyze the DNA of any plant or animal. 
Such sites are particularly suitable for analyzing the genome of 

20 mammals, including humans, non-human primates, domestic 
animals (such as dogs, cats, etc.), farm animals (such as cattle, 
sheep, etc.) and other economically important animals, in 
particular, horses. They may, however be used with regard to 
other types of animals, particularly birds (such as chickens, 

25 turkeys, etc.) SNPs have several salient advantages over RFLPs, 
STRs and VNTRs. 

First, SNPs occur at greater frequency (approximately 10- 
100 fold greater), and with greater uniformity than RFLPs and 
VNTRs. The greater frequency of SNPs means that they can be 

30 more readily identified than the other classes of polymorphisms. 
The greater uniformity of their distribution permits the 
identification of SNPs "nearer" to a particular trait of interest. 
The combined effect of these two attributes makes SNPs 
extremely valuable. For example, if a particular trait (e.g. 

35 predisposition to cancer) reflects a mutation at a particular 
locus, then any polymorphism that is linked to the particular 
locus can be used to predict the probability that an individual 
will be exhibiting that trait. 
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The value of such a prediction is determined in part by the 
distance between the polymorphism and the locus. Thus, if the 
locus is located far from any repeated tandem nucleotide 
sequence motifs, VNTR analysis will be of very limited value. 
5 Similarly, if the locus is far from any detectable RFLP, an RFLP 
analysis would not be accurate. However, since the SNPs of the 
present invention are present approximately once every 300 
bases in the mammalian genome, and exhibit uniformity of 
distribution, a SNP can, statistically, be found within 150 bases 

1 0 of any particular genetic lesion or mutation. Indeed, the 
particular mutation may itself be an SNP. Thus, where such 
locus has been sequenced, the variation in that locus' nucleotide 
is determinative of the trait in question. 

Second, SNPs are more stable than other classes of 

1 5 polymorphisms. Their spontaneous mutation rate is 
approximately 10" 9 , approximately 1,000 times less frequent 
than VNTRs. Significantly, VNTR-type polymorphisms are 
characterized by high mutation rates. 

Third, SNPs have the further advantage that their allelic 

2 0 frequency can be inferred from the study of relatively few 
representative samples. These attributes of SNPs permit a much 
higher degree of genetic resolution of identity, paternity 
exclusion, and analysis of an animal's predisposition for a 
particular genetic trait than is possible with either RFLP or 

2 5 VNTR polymorphisms. 

Fourth, SNPs reflect the highest possible definition of 
genetic information - nucleotide position and base identity. 
Despite providing such a high degree of definition, SNPs can be 
detected more readily than either RFLPs or VNTRs, and with 
30 greater flexibility. Indeed, because DNA is double-stranded, the 
complimentary strand of the allele can be analyzed to confirm 
the presence and identity of any SNP. 

The flexibility with which an identified SNP can be 
characterized is a salient feature of SNPs. VNTR-type 

3 5 polymorphisms, for example, are most easily detected through 

size fractionation methods that can discern a variation in the 
number of the repeats. RFLPs are most easily detected by size 
fractionation methods following restriction digestion. 
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In contrast, SNPs can be characterized using any of a 
variety of methods. Such methods include the direct or indirect 
sequencing of the site, the use of restriction enzymes where the 
respective alleles of the site create or destroy a restriction 
5 site, the use of allele-specific hybridization probes, the use of 
antibodies that are specific for the proteins encoded by the 
different alleles of the polymorphism, or by other biochemical 
interpretation. 

The "Genetic Bit Analysis ("GBA") method disclosed by 

10 Goelet, P. et al . (WO 92/15712, herein incorporated by 
reference), and discussed below, is a preferred method for 
detecting the single nucleotide polymorphisms of the present 
invention. GBA is a method of polymorphic site interrogation in 
which the nucleotide sequence information surrounding the site 

15 of variation in a target DNA sequence is used to design an 
oligonucleotide primer that is complementary to the region 
immediately adjacent to, but not including, the variable 
nucleotide in the target DNA. The target DNA template is 
selected from the biological sample and hybridized to the 

20 interrogating primer. This primer is extended by a single 
labeled dideoxynucleotide using DNA polymerase in the presence 
of two, and preferably all four chain terminating nucleoside 
triphosphate precursors. Cohen, D. et al. (PCT Application 
WO91/02087) describes a related method of genotyping. 

25 Recently, several primer-guided nucleotide incorporation 

procedures for assaying polymorphic sites in DNA have been 
described (Komher, J. S. et al .. Nucl. Acids. Res. 17:7779-7784 
(1989); Sokolov, B. P., Nucl. Acids Res. 18:3671 (1990); Syvanen, 
A.-C, et al .. Genomics 8:684 - 692 (1990); Kuppuswamy, M.N.eJ 

30 fli:. Prnr. Natl. Acad. Sci. (U.S.A.1 88:1 143-1 147 (1991); Prezant, 
T.R. et al .. Hum. Mutat. 1:159-164 (1992); Ugozzoli, L. el_al., 

GATA 9:107-112 (1992); Nyren, P. et al .. Anal. Bjochem- 

208 :171-175 (1993)). These methods differ from GBA in that 
they all rely on the incorporation of labeled deoxynucleotides to 

35 discriminate between bases at a polymorphic site. In such a 
format, since the signal is proportional to the number of 
deoxynucleotides incorporated, polymorphisms that occur in runs 
of the same nucleotide can result in signals that are 



WO 95/12607 



- 16 - 



PCT/US94/12632 



proportional to the length of the run (Syvanen, A.-C, et a!., Amer. 
J. Hum. Genet. 52:46-59 (1993)). Such a range of locus-specific 
signals could be more complex to interpret, especially for 
heterozygotes, compared to the simple, ternary (2:0, 1:1, or 0:2) 
5 class of signals produced by the GBA method. In addition, for 
some loci, incorporation of an incorrect deoxynucleotide can 
occur even in the presence of the correct dideoxynucleotide 
(Komher, J. S. si aL, NucL Acids. Res. 1 7:7779-7784 (1989)). 
Such deoxynucleotide misincorporation events may be due to the 

10 Km of the DNA polymerase for the mispaired deoxy- substrate 
being comparable, in some sequence contexts, to the relatively 
poor Km of even a correctly base paired dideoxy- substrate 
(Kornberg, A., et aL In: DNA Replication, 2nd Edition, W.H. 
Freeman and Co., (1992); New York; Tabor, S. et aL . Proc. Natl. 

15 Acad. ScL (U.S.A.) 86:4076-4080 (1989)). This effect would 
contribute to the background noise in the polymorphic site 
interrogation. 

II. Methods for Discovering Novel Polymorphic Sites 

20 

A preferred method for discovering polymorphic sites 
involves comparative sequencing of genomic DNA fragments 
from a number of haploid genomes. In the preferred embodiment, 
illustrated in Figure 1, such sequencing is performed by 

2 5 preparing a random genomic library that contains 0.5-3 kb 

fragments of DNA derived from one member of a species. 
Sequences of these recombinants are then used to facilitate PGR 
sequencing of a number of randomly selected individuals of that 
species at the same genomic loci. 

3 0 From such genomic libraries (typically of approximately 

50,000 clones), several hundred (200-500) individual clones are 
purified, and the sequences of the termini of their inserts are 
determined. Only a small amount of terminal sequence data 
(100-200 bases) need be obtained to permit PGR amplification 
35 of the cloned region. The purpose of the sequencing is to obtain 
enough sequence information to permit the synthesis of primers 
suitable for mediating the amplification of the equivalent 
fragments from genomic DNA samples of other members of the 
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species. Preferably, such sequence determinations are 
performed using cycle sequencing methodology. 

The primers are used to amplify DNA from a panel of 
randomly selected members of the target species. The number 
5 of members in the panel determines the lowest frequency of the 
polymorphisms that are to be isolated. Thus, if six members are 
evaluated, a polymorphism that exists at a frequency of, for 
example, 0.01 might not be identified. In an illustrative, but 
oversimplified, mathematical treatment, a sampling of six 
10 members would be expected to identify only those 
polymorphisms that occur at a frequency of greater than about 
.08 (i.e. 1.0 total frequency divided by 6 members divided by 2 
alleles per genome). Thus, if one desires the identification of 
less frequent polymorphisms, a greater number of panel 

1 5 members must be evaluated. 

Cycle sequence analysis (Mullis, K. et aL . Cold Spring 
Harbor Svmp. Quant. Biol . 51 :263-273 (1986); Erlich H. et al. . 
European Patent Appln. 50,424; European Patent Appln. 84,796, 
European Patent Application 258,017, European Patent Appln. 

2 0 237,362; Mullis, K., European Patent Appln. 201,184; Mullis K. oi 

aL, U.S. Patent No. 4,683,202; Erlich, H., U.S. Patent No. 
4,582,788; and Saiki, R. et al. . U.S. Patent No. 4,683,194)) is 
facilitated through the use of automated DNA sequencing 
instruments and software (Applied Biosystems, Inc.). 

2 5 Differences between sequences of different animals can thereby 
be identified and confirmed by inspecting the relevant portion of 
the chromatograms on the computer screen. Differences are 
interpreted to reflect a DNA polymorphism only if the data was 
available for both strands, and present in more than one haploid 

30 example among the population of animals tested. Figure 2 
illustrates the preferred method for identifying new 
polymorphic sequences which is cycle sequencing of a random 
genomic fragment. The PCR fragments from five unrelated 
horses were electroeluted from acrylamide gels and sequenced 

35 using repetitive cycles of thermostable Taq DNA polymerase in 
the presence of a mixture of dNTPs and fluorescent ddNTPs. The 
products were then separated and analyzed using an automated 
DNA sequencing instrument of Applied Biosystems, Inc. The data 
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was analyzed using ABI software. Differences between 
sequences of different animals were identified by the software 
and confirmed by inspecting the relevant portion of the 
chromatograms on the computer screen. Differences are 
5 presented as "DNA Polymorphisms" only if the data is available 
for both strands and present in more than one haploid example 
among the five horses tested. The top panel shows an "A" 
homozygote, the middle panel an "AT" heterozygote and the 
bottom panel a T" homozygote. 

1 0 Despite the randomized nature of such a search for 

polymorphisms, such sequencing and comparison of random DNA 
clones is readily able to identify suitable polymorphisms. 
Indeed, with respect to the horse, approximately 1/400 
nucleotides sequenced by these methods would be discovered as 

1 5 the polymorphic site of an SNP. 

The discovery of polymorphic sites can alternatively be 
conducted using the strategy outlined in Figure 3. In this 
embodiment, the DNA sequence polymorphisms are identified by 
comparing the restriction endonuclease cleavage profiles 

20 generated by a panel of several restriction enzymes on products 
of the PCR reaction from the genomic templates of unrelated 
members. Most preferably, each of the restriction 
endonucleases used will have four base recognition sequences, 
and will therefore allow a desirable number of cuts in the 

25 amplified products. 

The restriction digestion patterns obtained from the 
genomic DNAs are preferably compared directly to the patterns 
obtained from PCR products generated using the corresponding 
plasmid templates. Such a comparison provides an internal 

30 control which indicates that the amplified sequences from the 
genomic and plasmid DNAs derive from equivalent loci. This 
control also allows identification of primers that fortuitously 
amplify repeated sequences, or multicopy loci, since these will 
generate many more fragments from the genomic DNA templates 

35 than from the plasmid templates. 
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III. Methods for Genotyping the Single Nucleotide 
Polymorphisms of the Present Invention 

Any of a variety of methods can be used to identify the 
polymorphic site, "X," of a single nucleotide polymorphism of the 
5 present invention. The preferred method of such identification 
involves directly ascertaining the sequence of the polymorphic 
site for each polymorphism being analyzed. This approach is 
thus markedly different from the RFLP method which analyzes 
patterns of bands rather than the specific sequence of a 
1 0 polymorphism. 

A. Sampling Methods 

Nucleic acid specimens may be obtained from an individual 

1 5 of the species that is to be analyzed using either "invasive" or 

"non-invasive" sampling means. A sampling means is said to be 
"invasive" if it involves the collection of nucleic acids from 
within the skin or organs of an animal (including, especially, a 
murine, a human, an ovine, an equine, a bovine, a porcine, a 

2 0 canine, or a feline animal). Examples of invasive methods 

include blood collection, semen collection, needle biopsy, pleural 
aspiration, etc. Examples of such methods are discussed by Kim, 
C.H. et al . ( J. Virol. 66:3879-3882 (1992)); Biswas, B. et al . 
{ Annals NY Acad. Sci. 590 :582-583 (1990)); Biswas, B. et al. (J . 

25 Clip. Microbiol. 29:2228-2233 (1991)). 

In contrast, a "non-invasive" sampling means is one in 
which the nucleic acid molecules are recovered from an internal 
or external surface of the animal. Examples of such "non- 
invasive" sampling means include "swabbing," collection of 

30 tears, saliva, urine, fecal material, sweat or perspiration, etc. 
As used herein, "swabbing" denotes contacting an 
applicator/collector ("swab") containing or comprising an 
adsorbent material to a surface in a manner sufficient to collect 
surface debris and/or dead or sloughed off cells or cellular 

35 debris. Such collection may be accomplished by swabbing nasal, 
oral, rectal, vaginal or aural orifices, by contacting the skin or 
tear ducts, by collecting hair follicles, etc. 
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Nasal swabs have been used to obtain clinical specimens 
for PCR amplification (Olive, D.M. et at. . J- Gen. Virol. 71:2141- 
2147 (1990); Wheeler, J.G. et al- Amer. J. Vet. Res. 52:1799- 
1803 (1991)). The use of hair follicles to identify VNTR 
5 polymorphisms for paternity testing in horses has been 
described by Ellegren, H. et al . ( Animal Genetics 23:1 33-142 
(1992). The reference states that a standardized testing system 
based on PCR-analyzed microsatellite polymorphisms are likely 
to be an alternative to blood typing for paternity testing. 

10 A preferred swab for the collection of DNA will comprise a 

solid support, at least a portion of which is designed to adsorb 
DNA. The portion designed to adsorb DNA may be of a 
compressible texture, such as a "foam rubber," or the like. 
Alternatively, it may be an adsorptive fibrous composition, such 

15 as cotton, polyester, nylon, or the like. In yet another 
embodiment, the portion designed to adsorb DNA may be an 
abrasive material, such as a bristle or brush, or having a rough 
surface. The portion of the swab that is designed to adsorb DNA 
may be a combination of the above textures and compositions 

20 (such as a compressible brush, etc.). The swab will, preferably, 
be specially formed in a substantially rod-like, arrow-like or 
mushroom-like shape, such that it will have a segment that can 
be held by the collecting individual, and a tip or end portion 
which can be placed into contact with the surface that contains 

25 the sample DNA that is to be collected. In one embodiment, the 
swab will be provided with a storage chamber, such as a plastic 
or glass tube or cylinder, which may have one open end, such as a 
test-tube. Alternatively, the tube may have two open ends, such 
that after swabbing, the collector can pull on one end of the 

3 0 swab so as to cause the other end of the swab to be withdrawn 
into the tube. In yet another embodiment, the tube may have two 
open ends, such that after swabbing, the tube can be converted 
into a column to assist in the further processing of the collected 
DNA. In one embodiment, the end or ends of the storage chamber 

35 are self-sealing after swabbing has been accomplished. 

The swab or the storage chamber may contain 
antimicrobial agents at concentrations sufficient to prevent the 
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proliferation of microbes (bacteria, yeast, molds, etc.) during 
subsequent storage or handling. 

In one embodiment, the swab or storage chamber will 
contain an chromogenic reagent which reacts to the presence of 
5 DNA to yield a detectable signal that can be identified at the 
time of sample collection. Most preferably, such a reagent will 
comprise a minimum concentration "open-end point" assay for 
DNA. Such an assay is capable of detecting concentrations of 
nucleic acids that range from the minimum detection level of 
1 0 the assay to the maximum assay saturation level of the assay. 
This saturation level is adjustable, and can be increased by 
decreasing the time of reaction. Preferred chromogenic 
reagents include anti-DNA antibodies that are conjugated to 
enzymes, diaminopimelic acid, etc. 

15 

B. Amplification-Based Analysis 

The detection of polymorphic sites in a sample of DNA may 
be facilitated through the use of DNA amplification methods. 

20 Such methods specifically increase the concentration of 
sequences that span the polymorphic site, or include that site 
and sequences located either distal or proximal to it. Such 
amplified molecules can be readily detected by gel 
electrophoresis or other means. 

25 The most preferred method of achieving such amplification 

employs PCR, using primer pairs that are capable of hybridizing 
to the proximal sequences that define a polymorphism in its 
double-stranded form. 

In lieu of PCR, alternative methods, such as the "Ligase 

30 Chain Reaction" ("LCR") may be used (Barany, F., Proc. Natl. Acad. 
Sci. (U. S.A.I 88 :189-193 (1991). LCR uses two pairs of 
oligonucleotide probes to exponentially amplify a specific 
target. The sequences of each pair of oligonucleotides is 
selected to permit the pair to hybridize to abutting sequences of 

35 the same strand of the target. Such hybridization forms a 
substrate for a template-dependent ligase. As with PCR, the 
resulting products thus serve as a template in subsequent cycles 
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and an exponential amplification of the desired sequence is 
obtained. 

In accordance with the present invention, LCR can be 
performed with oligonucleotides having the proximal and distal 
5 sequences of the same strand of a polymorphic site. In one 
embodiment, either oligonucleotide will be designed to include 
the actual polymorphic site of the polymorphism. In such an 
embodiment, the reaction conditions are selected such that the 
oligonucleotides can be ligated together only if the target 

10 molecule either contains or lacks the specific nucleotide that is 
complementary to the polymorphic site present on the 
oligonucleotide. 

In an alternative embodiment, the oligonucleotides will 
not include the polymorphic site, such that when they hybridize 

15 to the target molecule, a "gap" is created (see, Segev, D., PCT 
Application WO 90/01069). This gap is then "filled" with 
complementary dNTPs (as mediated by DNA polymerase), or by an 
additional pair of oligonucleotides. Thus, at the end of each 
cycle, each single strand has a complement capable of serving as 

2 0 a target during the next cycle and exponential amplification of 
the desired sequence is obtained. 

The "Oligonucleotide Ligation Assay" ("OLA") (Landegren, U. 
et al. . Science 241 :1077-1080 (1988)) shares certain 
similarities with LCR and may also be adapted for use in 

25 polymorphic analysis. The OLA protocol uses two 
oligonucleotides which are designed to be capable of hybridizing 
to abutting sequences of a single strand of a target. OLA, like 
LCR, is particularly suited for the detection of point mutations. 
Unlike LCR, however, OLA results in "linear" rather than 

30 exponential amplification of the target sequence. 

Nickerson, D.A. et al. have described a nucleic acid 
detection assay that combines attributes of PCR and OLA 
(Nickerson, D.A. et aL Eras, Natl. Acad. Sci. (U.S.A.) 87:8923- 
8927 (1990). In this method, PCR is used to achieve the 

35 exponential amplification of target DNA, which is then detected 
using OLA. In addition to requiring multiple, and separate, 
processing steps, one problem associated with such 
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combinations is that they inherit all of the problems associated 

with PCR and OLA. 

Schemes based on ligation of two (or more) 

oligonucleotides in the presence of nucleic acid having the 
5 sequence of the resulting "di-oligonucleotide", thereby 

amplifying the di-oligonucleotide, are also known (Wu, D.Y. et_ai., 

Genomics 4:560 (1989)), and may be readily adapted to the 

purposes of the present invention. 

Other known nucleic acid amplification procedures, such as 
10 transcription-based amplification systems (Malek, L.T. ej_ai., 

U.S. Patent 5,130,238; Davey, C. et aj ., European Patent 

Application 329,822; Schuster et al „ U.S. Patent 5,169,766; 

Miller, H.I. et al .. PCT appln. WO 89/06700; Kwoh, D. §!_§!., Proc. 

Natl. Acad. Sci. (U.S.A.) 86:1 173 (1989); Gingeras, T.R. eLaj., PCT 
15 application WO 88/10315)), or isothermal amplification 

methods (Walker, G.T. et al .. Proc. Natl. Acad. Sci. (U.S.A.) 

8_9_:392-396 (1992)) may also be used. 

C. Preparation of Single-Stranded DNA 

20 

The direct analysis of the sequence of an SNP of the 
present invention can be accomplished using either the "dideoxy- 
mediated chain termination method," also known as the "Sanger 
Method" (Sanger, F., et al.. J. Molec. Biol. 24:441 (1975)) or the 

25 "chemical degradation method," "also known as the "Maxam- 

Gilbert method" (Maxam, A.M., et al .. Proc. Natl. Acad, §ci 

(U.S.A.V 74:560 (1977), both references herein incorporated by 
reference). Methods for sequencing DNA using either the 
dideoxy-mediated method or the Maxam-Gilbert method are 

30 widely known to those of ordinary skill in the art. Such methods 
are, for example, disclosed in Sambrook, J., et al., Molecular 
Cloning, a t ahoratorv M anual. 2nd Edition. Cold Spring Harbor 
Press . Cold Spring Harbor, New York (1989), and in Zyskind, J.W., 
et al .. Recombinant DNA Labo ratory Manual. Academic Press, Inc., 

35 New York (1988), both herein incorporated by reference. 

Where a nucleic acid sample contains double-stranded DNA 
(or RNA), or where a double-stranded nucleic acid amplification 
protocol (such as PCR) has been employed, it is generally 
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desirable to conduct such sequence analysis after treating the 
double-stranded molecules so as to obtain a preparation that is 
enriched for, and preferably predominantly, only one of the two 
strands. 

5 The simplest method for generating single-stranded DNA 

molecules from double-stranded DNA is denaturation using heat 
or alkalai treatment. 

Single-stranded DNA molecules may also be produced using 
the single-stranded DNA bacteriophage M13 (Messing, J. et al .. 

10 Meth. Enzvmol. 101 :20 (1983); see also, Sambrook, J., et al. (In: 
Molecular Cloning: A Laboratory Manual . Cold Spring Harbor 
Laboratory Press, Cold Spring Harbor, NY (1989)). 

Several alternative methods can be used to generate 
single-stranded DNA molecules. Gyllensten, U. et al. . ( Proc, 

15 Natl. Acad. Sci. (U.S.A.) 85:7652-7656 (1988) and Mihovilovic, 
M. et al. . ( BioTechniques 7(1) :14' (1989)) describe a method, 
termed "asymmetric PCR," in which the standard M PCR M method is 
conducted using primers that are present in different molar 
concentrations. Higuchi* R.G. et al . ( Nucleic Acids Res. 17:5865 

20 (1985)) exemplifies an additional method for generating single- 
stranded amplification products. The method entails 
phosphorylating the S'-terminus of one strand of a double- 
stranded amplification product, and then permitting a 5' -> 3' 
exonuclease (such as exonuclease) to preferentially degrade the 

25 phosphorylated strand. 

Other methods have also exploited the nuclease resistant 
properties of phosphorothioate derivatives in order to generate 
single-stranded DNA molecules (Benkovic et al .. U.S. Patent No. 
4,521,509; June 4, 1985); Sayers, J.R. et al . ( Nucl. Acids Res. 

30 16:791-802 (1988); Eckstein, F. et al .. Biochemistry 15:1685- 
1691 (1976); Ott, J. et aL Biochemistry 26:8237-8241 (1987)). 

A discussion of the relative advantages and disadvantages 
of such methods of producing single-stranded molecules is 
provided by Nikiforov, T. (U.S. patent application serial no. 

35 08/005,061, herein incorporated by reference). 

Most preferably, such single-stranded molecules will be 
produced using the methods described by Nikiforov, T. (U.S. 
patent application serial no. 08/005,061, herein incorporated by 
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reference). In brief, these methods employ nuclease resistant 
nucleotides derivatives, and incorporates such derivatives, by 
chemical synthesis or enzymatic means, into primer molecules, 
or their extension products, in place of naturally occurring 
5 nucleotides. 

Suitable nucleotide derivatives include derivatives in 
which one or two of the non-bridging oxygens of the phosphate 
moiety of a nucleotide has been replaced with a sulfur- 
containing group (especially a phosphorothioate), an alkyl group 
1 0 (especially a methyl or ethyl alkyl group), a nitrogen-containing 
group (especially an amine), and/or a selenium-containing group, 
etc. 

Phosphorothioate deoxyribonucleotide or ribonucleotide 
derivatives (e.g. a nucleoside 5-0-1 -thiotriphosphate) are the 

1 5 most preferred nucleotide derivatives. Any of a variety of 
chemical methods may be used to produce such phosphorothioate 
derivatives (see, for example, Zon, G. et al.. Anti-Canc. Drug De$. 
S_:539-568 (1991); Kim, S.G. et al .. Biochem. Biophvs. Res. 
Commun. 179_:1614-1619 (1991); Vu, H. eLaj., Tetrahedron Lett- 

20 32:3005-3008 (1991); Taylor, J.W. et al .. Nucl. Acids Res. 
13:8749-8764 (1985); Eckstein, F. et al.. Biochemistry 15:1685- 
1691 (1976); Ott, J. et al. . Biochemistry 26:8237-8241 (1987); 
Ludwig, J. et al .. J. Pro. Chem. £4:631-635 (1989), all herein 
incorporated by reference). Phosphorothioate nucleotide 

25 derivatives can also be obtained commercially from Amersham 
or Pharmacia. 

Importantly, the selected nucleotide derivative must be 
suitable for in vitro primer-mediated extension and provide 
nuclease resistance to the region of the nucleic acid molecule in 

30 which it is incorporated. In the most preferred embodiment, it 
must confer resistance to exonucleases that attack double- 
stranded DNA from the 5'-end (5'-* 3' exonucleases). Examples of 
such exonucleases include bacteriophage T7 gene 6 exonuclease 
("T7 exonuclease) and the bacteriophage lambda exonuclease ("X 

35 exonuclease"). Both T7 exonuclease and X exonuclease are 
inhibited to a significant degree by the presence of 
phosphorothioate bonds so as to allow the selective degradation 
of one of the strands. However, any double-strand specific, 
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5«_>3' exonuclease can be used for this process, provided that its 
activity is affected by the presence of the bonds of the nuclease 
resistant nucleotide derivatives. The preferred enzyme when 
using phosphorothioate derivatives is the T7 gene 6 exonuclease, 
5 which shows maximal enzymatic activity in the same buffer 
used for many DNA dependent polymerase buffers including Taq 
polymerase. The 5'->3' exonuclease resistant properties of 
phosphorothioate derivative-containing DNA molecules are 
discussed, for example, in Kunkel, T.A. (In: Nucleic Acids and 

10 Molecular Biology . Vol. 2, 124-135 (Eckstein, F. et al ., eds,). 
Springer-Verlag, Berlin, (1988)). The 3*-> 5' exonuclease 
resistant properties of phosphorothioate nucleotide containing 
nucleic acid molecules are disclosed in Putney, S.D., et al. (Proc. 
Natl. Acad. Sci. (U.S.A.) 78:7350-7354 (1981)) and Gupta, A.P., al 

15 aL ( Nucl. Acids. Res. . 12:5897-5911 (1984)). 

In addition to being resistant to such exonucleases, nucleic 
acid molecules that contain phosphorothioate derivatives at 
restriction endonuclease cleavage recognition sites are 
resistant to such cleavage. Taylor, J.W., et al. (Nucl. Acids Res., 

2 0 13:8749-8764 (1985)) discusses the endonuclease resistant 
properties of phosphorothioate nucleotide containing nucleic 
acid molecules. 

The nuclease resistance of phosphorothioate bonds has 
been utilized in a DNA amplification protocol (Walker, T.G. et al . 

25 ( Proc. Natl. Acad. Sci. (U.S.A.) 89:392-396 (1992)). In the Walker 
et al . method, phosphorothioate nucleotide derivatives are 
installed within a restriction endonuclease recognition site in 
one strand of a double-stranded DNA molecule. The presence of 
the phosphorothioate nucleotide derivatives protects that strand 

30 from cleavage, and thus results in the nicking of the unprotected 
strand by the restriction endonuclease. Amplification is 
accomplished by cycling the nicking and polymerization of the 
strands. 

Similarly, this resistance to nuclease attack has been used 
35 as the basis for a modified "Sanger" sequencing method (Labeit, 
S. et al . ( DNA 5:173-177 (1986)). In the Labeit et al . method, 
35s-labeled phosphorothioate nucleotide derivatives were 
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employed in lieu of the dideoxy nucleotides of the "Sanger" 
method. 

In the most preferred embodiment, the phosphorothioate 
derivative is included in the primer. The nucleotide derivative 
5 may be incorporated into any position of the primer, but will 
preferably be incorporated at the 5'-terminus of the primer, 
most preferably adjacent to one another. Preferably, the primer 
molecules will be approximately 25 nucleotides in length, and 
contain from about 4% to about 100%, and more preferably from 

10 about 4% to about 40%, and most preferably about 16%, 
phosphorothioate residues (as compared to total residues). The 
nucleotides may be incorporated into any position of the primer, 
and may be adjacent to one another, or interspersed across all or 
part of the primer. 

15 In one embodiment, the present invention can be used in 

concert with an amplification protocol, for example, PCR. In 
this embodiment, it is preferred to limit the number of 
phosphorothioate bonds of the primers to about 10 (or 
approximately half of the length of the primers), so that the 

20 primers can be used in a PCR reaction without any changes to the 
PCR protocol that has been established for non-modified 
primers. When the primers contain more phosphorothioate 
bonds, the PCR conditions may require adjustment, especially of 
the annealing temperature, in order to optimize the reaction. 

25 The incorporation of such nucleotide derivatives into DNA 

or RNA can be accomplished enzymatically, using a DNA 
polymerase (Vosberg, H.P. et al .. Biochemistry 16 : 3633-3640 
(1977); Burgers, P.M.J, et al .. J. Biol. Chem. 254 :6889-6893 
(1979); Kunkel, T.A., In: Nucleic Acids and Molecular Biology . Vol. 

30 2, 124-135 (Eckstein, F. et at ., eds.), Springer-Verlag, Berlin, 
(1988); Olsen, D.B. et al.. Proc. Natl. Acad. Sci. (U.S.A.) 87:1451- 
1455 (1990); Griep, M.A. et al .. Biochemistry 29:9006-9014 
(1990); Sayers, J.R. et al.. Nucl. Acids Res. 16:791-802 (1988)). 
Alternatively, phosphorothioate nucleotide derivatives can be 

35 incorporated synthetically into an oligonucleotide (Zon, G. et al .. 
Anti-Hann. Drug Pes. £:539-568 (1991)). 

The primer molecules are permitted to hybridize to a 
complementary target nucleic acid molecule, and are then 
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extended, preferably via a polymerase, to form an extension 
product. The presence of the phosphorothioate nucleotides in the 
primers renders the extension product resistant to nuclease 
attack. As indicated, the amplification products containing 
5 phosphorothioate or other suitable nucleotide derivatives are 
substantially resistant to "elimination" (i.e. degradation) by 
M 5'->3'" exonucleases such as T7 exonuclease or exonuclease, and 
thus a 5'->3' exonuclease will be substantially incapable of 
further degrading a nucleic acid molecule once it has 

1 0 encountered a phosphorothioate residue. 

Since the target molecule lacks nuclease resistant 
residues, the incubation of the extension product and its 
template - the target - in the presence of a 5'->3' exonuclease 
results in the destruction of the template strand, and thereby 

1 5 achieves the preferential production of the desired single 
strand. 

D. Solid Phase Attachment of DNA 

20 The preferred method of determining the identity of the 

polymorphic site of a polymorphism involves nucleic acid 
hybridization. Although such hybridization can be performed in 
solution (Berk, A.J., et al. QeU 12:721 -732 (1977); Hood, L.E., el 
al. . In: Molecular Biology of Eukaryotic Cells: A Problems 

25 Approach . Menlo Park, CA: Benjamin-Cummings, (1975); Wetmer, 
J.G., Hybridization and Renaturation Kinetics of Nucleic Acids . 
Ann. Rev. Biophys. Bioena. 5:337-361 (1976); Itakura, K. ( et aL 
Ann. Rev. Biochem. 53 :323-356. (1984)), it is preferable to 
employ a solid-phase hybridization assay (see, Saiki, R.K. et al. . 

30 Proc. Natl. Acad. Sci. (U.S.A.l 86:6230-6234 (1989); Gilham et al. . 
J. Amer. Chem. Soc. 86 :4982 (1964) and Kremsky et al. . Nucl. 
Acids Res. 15:3131-3139 (1987)). 

Any of a variety of methods can be used to immobilize 
oligonucleotides to the solid support. One of the most widely 

35 used methods to achieve such an immobilization of 
oligonucleotide primers for subsequent use in hybridization- 
based assays consists of the non-covalent coating of these solid 
phases with streptavidin or avidin and the subsequent 
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immobilization of biotinylated oligonucleotides (Holmstrom, K. 
et al. . Anal Biochem. 209 :278-283 (1993)). Another known 
method (Running. J.A. et al. . BioTechniques 8:276-277 (1990); 
Newton, C.R. et al. Nucl. Acids Res. 21 :1 1 55-1 1 62 (1993)) 
5 requires the pre-coating of the polystyrene or glass solid phases 
with poly-L-Lys or poly L-Lys, Phe, followed by the covalent 
attachment of either amino- or sulfhydryl-modified 
oligonucleotides using bifunctional crosslinking reagents. Both 
methods have the disadvantage of requiring the use of modified 

1 0 oligonucleotides as well as a pre-treatment of the solid phase. 

In another published method (Kawai, S et al. . Anal . 
Biochem. 209 :63-69 (1993)), short oligonucleotide probes were 
ligated together to form multimers and these were ligated into a 
phagemid vector. Following in vitro amplification and isolation 

1 5 of the single-stranded form of these phagemids, they were 
immobilized onto polystyrene plates and fixed by UV irradiation 
at 254 nm. The probes immobilized in this way were then used 
to capture and detect a biotinylated PCR product. 

A method for the direct covalent attachment of short, 5'- 

20 phosphorylated primers to chemically modified polystyrene 
plates ("Covalink" plates, Nunc) has also been published 
(Rasmussen, S.R. et al. . Anal. Biochem. 198:138-142 (1991)). The 
covalent bond between the modified oligonucleotide and the 
solid phase surface is introduced by condensation with a water- 

25 soluble carbodiimide. This method is claimed to assure a 
predominantly 5'-attachment of the oligonucleotides via their 
5'-phosphates; however, it requires the use of specially 
prepared, expensive plates. 

Most preferably, such immobilization of oligonucleotides 

30 (preferably between 15 and 30 bases) is accomplished using a 
method that can be used directly, without the need for any pre- 
treatment of commercially available polystyrene microwell 
plates (ELISA plates) or microscope glass slides. Since 96 well 
polystyrene plates are widely used in ELISA tests, there has 

35 been significant interest in the development of methods for the 
immobilization of short oligonucleotide primers to the wells of 
these plates for subsequent hybridization assays. Also of 
interest is a method for the immobilization to microscope glass 
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slides, since the latter are used in the so-called Slide 
Imrnunoenzymatic Assay (SIA) (de Macario, E.C. et al.. 
BioTechniques 3:138-145 (1985)). 

The solid support can be glass, plastic, paper, etc. The 
5 support can be fashioned as a bead, dipstick, test tube, etc. In a 
preferred embodiment, the support will be a microtiter dish, 
having a multiplicity of wells. The conventional 96-well 
microtiter dishes used in diagnostic laboratories and in tissue 
culture are a preferred support. The use of such a support 

1 0 allows the simultaneous determination of a large number of 
samples and controls, and thus facilitates the analysis. 
Automated delivery systems can be used to provide reagents to 
such microtiter dishes. Similarly, spectrophotometric methods 
can be used to analyze the polymorphic sites, and such analysis 

1 5 can be conducted using automated spectrophotometers. 

One aspect of the present invention concerns a method for 
immobilizing oligonucleotides for such analysis. In accordance 
with the method, any of a number of commercially available 
polystyrene plates can be used directly for the immobilization, 

20 provided that they have a hydrophilic surface. Examples of 
suitable plates include the Immulon 4 plates (Dynatech) and the 
Maxisorp plates (Nunc). The immobilization of the 
oligonucleotides to the plates is achieved simply by incubation 
in the presence of a suitable salt. No immobilization takes place 

25 in the absence of a salt, i.e., when the oligonucleotide is present 
in a water solution. Examples for suitable salts are: 50-250 mM 
NaCI; 30- 1 00 mM 1 - e t h y I - 3 - ( 3 * - 

dimethylaminopropyl)carbodiimide hydrochloride (EDC), pH 6.8; 
50-150 mM octyldimethylamine hydrochloride, pH 7.0; 50-250 

30 mM tetramethylammonium chloride. The immobilization is 
achieved by incubation, preferably at room temperature for 3 to 
24 hours. After such incubation, the plates are washed, 
preferably with a solution of 10 mM Tris HCI, pH 7.5, containing 
150 mM NaCI and 0.05% vol. Tween-20 (TNTw). The latter 

35 ingredient serves the important role of blocking all free 
oligonucleotide binding sites still present on the polystyrene 
surface, so that no nonspecific binding of oligonucleotides can 
take place during the subsequent hybridization steps. Using 
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radioactively labeled oligonucleotides, the amount of 
immobilized oligonucleotides per well was determined to be at 
least 500 fmoles. The oligonucleotides are immobilized to the 
surface of the plate with sufficient stability and can only be 
5 removed by prolonged incubations with 0.5 M NaOH solutions at 
elevated temperatures. No oligonucleotide is removed by 
washing the plate with water, TNTw (Tween 20), PBS, 1.5 M 
NaCI, or other similar solutions. 

The immobilized oligonucleotides can be used to capture 
1 0 specific DNA sequences by hybridization. The hybridization is 
usually carried out in a solution containing 1.5 M NaCI and 10 mM 
EDTA, for 15 to 30 minutes at room temperature. Other 
hybridization conditions can also be used. More than 400 fmoles 
of a specific DNA sequence was found to hybridize to the 

1 5 immobilized oligonucleotide in one well. This DNA is bound to 

the initially immobilized oligonucleotide only via Watson-Crick 
hydrogen bonds can be easily removed from the wells by a brief 
wash with a 0.1 M NaOH solution, without removing the initially 
attached oligonucleotide from the plate. If the captured DNA 
20 fragment is nonradioactively labeled, e.g., with a biotin residue, 
the detection can be carried out using a suitable enzyme-linked 
assay. 

Although no modifications have to be introduced into the 
synthetic oligonucleotides, the method also allows for the 

2 5 immobilization of labeled (e.g., biotinylated) oligonucleotides, if 

desired. The amount of oligonucleotide that can be immobilized 
in a single well of an ELISA plate by this method is at least 500 
fmoles. The oligonucleotides thus immobilized onto the solid 
phase can hybridize to suitable templates and also participate in 
30 enzymatic reactions like template-directed extensions and 
ligations. 

For high volume testing applications, it is desirable to use 
non-radioactive detection methods. Thus, the use of haptenated 
dideoxynucleotides is preferred; the use of biotinylated 
35 dideoxynucleotides is particularly preferred as such 
modification would render the incorporated base detectable by 
the standard avidin (or streptavidin) enzyme conjugates used in 
ELISA assays. The biotinylated ddNTPs are preferably prepared 
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by reacting the four respective (3-aminopropyn-1-yl)nucleoside 
triphosphates with sulfosuccinimidyl 6-(biotinamido)hexanoate. 
Thus, (3-aminopropyn-1 -yl) nucleoside 5'-triphosphates are 
prepared as described by Hobbs, F.W. ( J. Org. Chem. 54:3420-3422 
5 (1989)) and by Hobbs, F.W. et al . (U.S. Patent No. 5,047,519). The 
(3-aminopropyn-1-yl)nucleoside 5'-triphosphate (50 mol) is 
dissolved in 1 ml of pH 7.6, 1 M aqueous triethylammonium 
bicarbonate (TEAB). Sulfosuccinimidyl 6-(biotinamido) 
hexanoate sodium salt (Pierce, 55.7 mg, 100 mol) is added and 

1 0 the solution is heated to 50°C in a stoppered tube for 2 hr. The 

reaction mixture is diluted to 10 ml with water and applied to a 
DEAE-Sephadex A-25-120 column (1.6 x 19 cm). The column is 
eluted with a linear gradient of pH 7.6 aqueous TEAB (0.1 M to 
1.0 M) and the eluent monitored at 270 nm. The late-eluting 
1 5 major peak is collected, stripped, and co-evaporated with 
ethanol. The crude product, containing biotinylated nucleoside 
triphosphate and, in some cases, contaminating starting 
material, is further purified by reverse phase column 
chromatography (Baker C-18 packing, 2 x 12 cm bed). The 

2 0 material is loaded in 0.1 M pH 7.6 TEAB and eluted with a step 

gradient of acetonitrile in 0.1 M pH 7.6 TEAB (0% to 36%, 2% 
increments, 8 ml/step). In all cases, the biotinylated product is 
more strongly retained and cleanly resolved from the starting 
material. Product-containing fractions are pooled, stripped, and 

25 co-evaporated with ethanol. The product is taken up in water 
and the yield calculated using the absorption coefficient for the 
starting nucleotide. The 3 H NMR and 31 P NMR spectra are 
consistent with the expected structure and confirm the absence 
of phosphorus containing or nucleotide-derived impurities. The 

30 materials are observed to be >99% pure by HPLC (Waters 
Bondapak C-18, 4.6 x 250 mm, 1 ml/min, 1 to 35% CH 3 CN/pH 
7/0.01 M triethylammonium acetate). 

The synthesis of 5-(3-(6-biotinamido(hexanoamido) 
propyn-1 -y I) -2' f 3'-dideoxyuridine-5' -triphosphate has an 

35 approximate yield of 25% (assuming = 12,400 at 291.5 nm); 
HPLC t x = 16.1 min. 

The synthesis of 5-(3-(6-biotinamido(hexanoamido) 
propyn-1 -yl)-2*, 3'-dideoxycytidine-5'-tri phosphate has an 
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approximate yield of 63% (assuming = 9,230 at 294.5 nm); HPLC 
tx = 19.4 min. 

The synthesis of 7-(3-(6-biotinamido(hexanoamido) 
propyn-1-yl)-7-deaza-2',3'-dideoxyadenosine-5'-triphosphate 
5 has an approximate yield of 39% (assuming = 13,600 at 278.5 
nm); HPLC t x = 23.1 min. 

The synthesis of 7-(3-(6-biotinamido(hexanoamido) 
propyn-l-yO^-deaza^'.S'-dideoxyguanosine-S'-triphosphate 
has an approximate yield of 44% (assuming = 9,300 at 291 nm); 
10 HPLC t x = 21.2 min. 

E. Solid Phase Analysis of Polymorphic Sites 
1 . Polymerase-Mediated Analysis 

15 

Although the identity of the nucleotide(s) of the 
polymorphic sites of the present invention can be determined in 
a variety of ways, an especially preferred method exploits the 
oligonucleotide-based diagnostic assay of nucleic acid sequence 

20 variation disclosed by Goelet, P. et al . (PCT Application 
W092/15712, herein incorporated by reference). In this assay, a 
purified oligonucleotide having a defined sequence 
(complementary to an immediate proximal or distal sequence of 
a polymorphism) is bound to a solid support, especially a 

25 microtiter dish. A sample, suspected to contain the target 
molecule, or an amplification product thereof, is placed in 
contact with the support, and any target molecules present are 
permitted to hybridize to the bound oligonucleotide. 

In one preferred embodiment, an oligonucleotide having a 

30 sequence that is complementary to an immediately distal 
sequence of a polymorphism is prepared using the above- 
described methods (and preferably that of Nikiforov, T. (U.S. 
Patent Application Serial No. 08/005,061). The terminus of the 
oligonucleotide is attached to the solid support, as described, 

35 for example by Goelet, P. et al . (PCT Application WO 92/15712), 
such that the 3'-end of the oligonucleotide can serve as a 
substrate for primer extension. 
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The immobilized primer is then incubated in the presence 
of a DNA molecule (preferably a genomic DNA molecule) having a 
single nucleotide polymorphism whose immediately 3'-distal 
sequence is complementary to that of the immobilized primer. 
5 Preferably, such incubation occurs in the complete absence of 
any dNTP (i.e. dATP, dCTP, dGTP, or dTTP), but only in the 
presence of one or more chain terminating nucleotide 
triphosphate derivatives (such as a dideoxy derivative), and 
under conditions sufficient to permit the incorporation of such a 

1 0 derivative on to the 3'-terminus of the primer. As will be 
appreciated, where the polymorphic site is such that only two or 
three alleles exist (such that only two or three species of 
dNTPs, respectively, could be incorporated into the primer 
extension product), the presence of unusable nucleotide 

1 5 triphosphate(s) in the reaction is immaterial. In consequence of 
the incubation, and the use of only chain terminating nucleotide 
derivatives, a single dideoxynucleotide is added to the 3'- 
terminus of the primer. The identity of that added nucleotide is 
determined by, and is complementary to, the nucleotide of the 

20 polymorphic site of the polymorphism. 

In this embodiment, the nucleotide of the polymorphic site 
is thus determined by assaying which of the set of labeled 
nucleotides has been incorporated onto the 3'-terminus of the 
bound oligonucleotide by a primer-dependent polymerase. Most 

25 preferably, where multiple dideoxynucleotide derivatives are 
simultaneously employed, different labels will be used to permit 
the differential determination of the identity of the 
incorporated dideoxynucleotide derivative. 

30 2 . Polymerase/Ligase-Mediated Analysis 

In an alternative embodiment, the identity of the 
nucleotide of the polymorphic site is determined using a 
polymerase/ligase-mediated process. As in the above 
35 embodiment, an oligonucleotide primer is employed, that is 
complementary to the immediately 3'-distal invariant sequence 
of the SNP. A second oligonucleotide, is tethered to the solid 
phase via its 3'-end. The sequence of this oligonucleotide is 
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complementary to the S'-proximal sequence of the polymorphism 
being analyzed, but is incapable of hybridizing to the 
oligonucleotide primer. 

These oligonucleotides are incubated in the presence of 
5 DNA containing the single nucleotide polymorphism that is to be 
analyzed, and at least one 2', 5'-deoxynucleotide triphosphate. 
The incubation reaction further includes a DNA polymerase and a 
DNA ligase. Thus, for example, where the polymorphism of clone 
177-2 (Table 1) is being evaluated, and the tethered 

1 0 oligonucleotide could comprise the 3'-distal sequence of SEQ ID 

NO:2, the second oligonucleotide would have the 5'-proximal 
sequence of SEQ ID NO:1 . 

The tethered and soluble oligonucleotides are thus capable 
of hybridizing to the same strand of the single nucleotide 

1 5 polymorphism under analysis. The sequence considerations 
cause the two oligonucleotides to hybridize to the proximal and 
distal sequences of the SNP that flank the polymorphic site (X) 
of the polymorphism; the hybridized oligonucleotides are thus 
separated by a "gap" of a single nucleotide at the precise 

20 position of the polymorphic site. 

The presence of a polymerase and a 2', 5'-deoxynucleotide 
triphosphate complementary to (X) permits ligation of the 
primer extended with the complementary 2', 5'-deoxynucleotide 
triphosphate to the immobilized oligo complementary to the 

2 5 distal sequence, a 2', 5'-deoxynucleotide triphosphate that is 

complementary to the nucleotide of the polymorphic site 
permits the creation of a ligatable substrate. The ligation 
reaction immobilizes the 2', 5'-deoxynucleotide and the 
previously soluble primer oligonucleotide to the solid support. 

30 The identity of the polymorphic site that was opposite the 

"gap" can then be determined by any of several means. In a 
preferred embodiment, the 2', 5'-deoxynucleotide triphosphate of 
the reaction is labeled, and its detection thus reveals the 
identity of the complementary nucleotide of the polymorphic 

35 site. Several different 2', 5'-deoxynucleotide triphosphates may 
be present, each differentially labeled. Alternatively, separate 
reactions can be conducted, each with a different 2', 5'- 
deoxynucleotide triphosphate. In an alternative sub- 
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embodiment, the 2', 5'-deoxynucleotide triphosphates are 
unlabeled, and the second, soluble oligonucleotide is labeled. 
Separate reactions are conducted, each using a different 
unlabeled 2\ 5'-deoxynucleotide triphosphate. The reaction that 
5 contains the complementary nucleotide permits the ligatable 
substrate to form, and is detected by detecting the 
immobilization of the previously soluble oligonucleotide. 

F. Signal- Am pi if i cation 

10 

The sensitivity of nucleic acid hybridization detection 
assays may be increased by altering the manner in which 
detection is reported or signaled to the observer. Thus, for 
example, assay sensitivity can be increased through the use of 

1 5 detectably labeled reagents. A wide variety of such signal 
amplification methods have been designed for this purpose. 
Kourilsky et al. (U.S. Patent 4,581,333) describe the use of 
enzyme labels to increase sensitivity in a detection assay. 
Fluorescent labels (Albarella et al. . EP 144914), chemical labels 

20 (Sheldon III et al. . U.S. Patent 4,582,789; Albarella et al. . U.S. 
Patent 4,563,417), modified bases (Miyoshi et al. . EP 119448), 
etc. have also been used in an effort to improve the efficiency 
with which hybridization can be observed. 

It is preferable to employ fluorescent, and more preferably 

2 5 chromogenic (especially enzyme) labels, such that the identity 
of the incorporated nucleotide can be determined in an 
automated, or semi-automated manner using a 
spectrophotometer. 

30 IV. The Use of SNP Genotyping in Methods of Genetic 
Analysis 

A. General Considerations for Using Single 
Nucleotide Polymorphisms in Genetic Analysis 

35 

The utility of the polymorphic sites of the present 
invention stems from the ability to use such sites to predict the 
statistical probability that two individuals will have the same 
alleles for any given polymorphisms. 
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Statistical analysis of SNPs can be used for any of a 
variety of purposes. Where a particular animal has been 
previously tested, such testing can be used as a "fingerprint" 
with which to determine if a certain animal is, or is not that 
5 particular animal. 

Where a putative parent or both parents of an individual 
have been tested, the methods of the present invention may be 
used to determine the likelihood that a particular animal is or is 
not the progeny of such parent or parents. Thus, the detection 

1 0 and analysis of SNVs can be used to exclude paternity of a male 
for a particular individual (such as a stallion's paternity of a 
particular foal), or to assess the probability that a particular 
individual is the progeny of a selected female (such as a 
particular foal and a selected mare). 

1 5 As indicated below, the present invention permits the 

construction of a genetic map of a target species. Thus, the 
particular array of polymorphisms identified by the methods of 
the present invention can be correlated with a particular trait, 
in order to predict the predisposition of a particular animal (or 

20 plant) to such genetic disease, condition, or trait. As used 
herein, the term "trait" is intended to encompass "genetic 
disease," "condition," or "characteristics." The term, "genetic 
disease" denotes a pathological state caused by a mutation, 
regardless of whether that state can be detected or is 

25 asymptomatic. A "condition" denotes a predisposition to a 
characteristic (such as asthma, weak bones, blindness, ulcers, 
cancers, heart or cardiovascular illnesses, skeleto-muscular 
defects, etc.). A "characteristic" is an attribute that imparts 
economic value to a plant or animal. Examples of 

30 characteristics include longevity, speed, endurance, rate of 
aging, fertility, etc. 

B. Identification and Parentage Verification 

35 The most useful measurements for determining the power 

of an identification and paternity testing system are: (i) the 
"probability of identity" (p(ID)) and (ii) the "probability of 
exclusion" (p(exc)). The p(ID) calculates the likelihood that two 
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random individuals will have the same genotype with respect to 
a given polymorphic marker. The p(exc) calculates the 
likelihood, with respect to a given polymorphic marker, that a 
random male will have a genotype incompatible with him being 
5 the father in an average paternity case in which the identity of 
the mother is not in question. Since single genetic loci, 
including loci with numerous alleles such as the major 
histocompatibility region, rarely provide tests with adequate 
statistical confidence for paternity testing, a desirable test 

10 will preferably measure multiple unlinked loci in parallel. 
Cumulative probabilities of identity or non-identity, and 
cumulative probabilities of paternity exclusion are determined 
for these multi-locus tests by multiplying the probabilities 
provided by each locus. 

1 5 The statistical measurements of greatest interest are: (i) 

the cumulative probability of non-identity (cum p(nonlD)), and 
(ii) the cumulative probability of paternity exclusion (cum 
p(exc)). 

The formulas used for calculating these probability values 
20 are given below. For simplicity these are given first for 2- 
allele loci, where one allele is termed type A and the other type 
B. In such a model, four genotypes are possible: AA, AB, BA, and 
BB (types AB and BA being indistinguishable biochemically). The 
allelic frequency is given by the number of times A (f(A), the 
25 frequency of A is denoted by u p") or B (f(B), the frequency of B is 
denoted by "q," where q = 1-p) is found in the haploid genome. 
The probability of a given genotype at a given locus: 

• Homozygote: p{AA)= p 2 

30 

Single Heterozygote: p(AB)= p{BA)~ pq = p(1-p) 
3 5 Both Heterozygotes: p(AB+BA)= 2pq = 2p(1 -p) 
Homozygote: p(BB)= q 2 = (1-p) 2 
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The probability of identity at one locus (i.e the probability 
that two individuals, picked at random from a population will 
have identical genotypes at a given locus) is given by the 
5 equation: 

P(ID) = (p 2 ) 2 + (2pq) 2 + (g 2 ) 2 

10 The cumulative probability of identity for n loci is 

therefore given by the equation: 

cum p(ID) = C^/D^M/D^pi^)....^^) 

15 

The cumulative probability of non-identity for n loci (i.e. 
the probability that two individuals will be different at 1 or 
more loci) is given by the equation: 

20 cum p(nonlD) = 1 - cum p(ID) 

The probability of parentage exclusion (representing the 
probability that a random male will have a genotype, with 
25 respect to a given locus, that makes him incompatible as the 
sire in an average paternity case where the identity of the 
mother is not in question) is given by the equation: 

p(exc) = pq(1-pq) 

30 

The probability of non-exclusion (representing the 
probability at a given locus that a random male will not be 
biochemically excluded as the sire in an average paternity case) 
35 is given by the equation: 

p(non-exc) = 1 - p(exc) 
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The cumulative probability of non-exclusion (representing 
the value obtained when n loci are used) is thus: 

5 cum p{non-exc) = Qp(non-excJp(non-exc 2 )p(non-exc 3 )....p(non-exc n ) 



The cumulative probability of exclusion (representing the 
probability, using a panel of n loci, that a random male will be 

1 0 biochemically excluded as the sire in an average paternity case 

where the mother is not in question) is given by the equation: 

cum p(exc) = 1 - cum p(non-exc) 

15 

These calculations may be extended for any number of 
alleles at a given locus. For example, the probability of identity 
p(ID) for a 3-allele system where the alleles have the 
frequencies in the population of p, q and r, respectively, is equal 
20 to the sum of the squares of the genotype frequencies: 

p(/D) = p 4 + (2pq) 2 + (2gr) 2 + (2pr) 2 + r 4 + q 4 

2 5 Similarly i the probability of exclusion for a three allele 

system is given by: 

p(exc) = pq(1-pq) + qril-qr) + pr^-pr) + Spqr(l-pqr) 

30 

In a locus of n alleles, the appropriate binomial expansion 
is used to calculate p(ID) and p(exc). 

Figures 4 and 5 show how the cum p(nonlD) and the 
cum p(exc) increase with both the number and type of genetic 
35 loci used. It can be seen that greater discriminatory power is 
achieved with fewer markers when using three allele systems. 
In Figures 4 and 5, the triangles trace the increase in probability 
values with increasing numbers of loci with two alleles where 
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the common allele is present at a frequency of p = 0.79. The 
crosses in Figures 4 and 5 show the same analysis for increasing 
numbers of three-allele loci where p = 0.51, q = 0.34 and r = 
0.15. 

5 The choice between whether to use loci with 2, 3 or more 

alleles is however largely influenced by the above-described 
biochemical considerations. A polymorphic analysis test may be 
designed to score for any number of alleles at a given locus. If 
allelic scoring is to be performed using gel electrophoresis, 
1 0 each allele should be easily resolvable by gel electrophoresis. 
Since the length variations in multiple allelic families are often 
small, human DNA tests using multiple allelic families include 
statistical corrections for mistaken identification of alleles. 
Furthermore, although the appearance of a rare allele from a 

1 5 multiple allelic system may be highly informative, the rarity of 

these alleles makes accurate measurements of their frequency 
in the population extremely difficult. To correct for errors in 
these frequency estimates when using rare alleles, the 
statistical analysis of this data must include a measure of the 
20 cumulative effects of uncertainty in these frequency estimates. 
The use of these multiple allelic systems also increases the 
likelihood that new or rare alleles in the population will be 
discovered during the course of large population screening. The 
integrity of previously collected genetic data would be 

2 5 empirically revised to reflect the discovery of a new allele. 

In view of these considerations, although the use of loci 
with many alleles could potentially offer some short-term 
advantages (because fewer loci would need to be screened), it is 
preferable to perform polymorphic analyses using loci with 

3 0 fewer alleles that are: (i) more frequently represented, and (ii) 

easier to measure unambiguously. Tests of this type can achieve 
the same power of discrimination as tests based on more highly 
polymorphic loci, provided the same total number of alleles is 
collected from a series of unlinked loci. 

35 
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C. Gene Mapping and Genetic Trait Analysis Using 
SNPs 

The polymorphisms detected in a set of individuals of the 
same species (such as humans, horses, etc.), or of closely 
5 related species, can be analyzed to determine whether the 
presence or absence of a particular polymorphism correlates 
with a particular trait. 

To perform such polymorphic analysis, the presence or 
absence of a set of polymorphisms (i.e. a "polymorphic array") is 

10 determined for a set of the individuals, some of which exhibit a 
particular trait, and some of which exhibit a mutually exclusive 
characteristic (for example, with respect to horses, brittle 
bones vs. non-brittle bones; maturity onset blindness vs. no 
blindness; predisposition to asthma, cardiovascular disease vs. 

1 5 no such predisposition). The alleles of each polymorphism of the 
set are then reviewed to determine whether the presence or 
absence of a particular allele is associated with the particular 
trait of interest. Any such correlation defines a genetic map of 
the individual's species. Alleles that do not segregate randomly 

20 with respect to a trait can be used to predict the probability 
that a particular animal will express that characteristic. For 
example, if a particular polymorphic allele is present in only 
20% of the members of a species that exhibit a cardiovascular 
condition, then a particular member of that species containing 

25 that allele would have a 20% probability of exhibiting such a 
cardiovascular condition. As indicated, the predictive power of 
the analysis is increased by the extent of linkage between a 
particular polymorphic allele and a particular characteristic. 
Similarly, the predictive power of the analysis can be increased 

30 by simultaneously analyzing the alleles of multiple polymorphic 
loci and a particular trait. In the above example, if a second 
polymorphic allele was found to also be present in 20% of 
members exhibiting the cardiovascular condition, however, all of 
the evaluated members that exhibited such a cardiovascular 

35 condition had a particular combination of alleles for these first 
and second polymorphisms, then a particular member containing 
both such alleles would have a very high probability of 
exhibiting the cardiovascular condition. 
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The detection of multiple polymorphic sites permits one to. 
define the frequency with which such sites independently 
segregate in a population. If, for example, two polymorphic 
sites segregate randomly, then they are either on separate 
5 chromosomes, or are distant to one another on the same 
chromosome. Conversely, two polymorphic sites that are co- 
inherited at significant frequency are linked to one another on 
the same chromosome. An analysis of the frequency of 
segregation thus permits the establishment of a genetic map of 
1 0 markers. Thus, the present invention provides a means for 
mapping the genomes of plants and animals. 

The resolution of a genetic map is proportional to the 
number of markers that it contains. Since the methods of the 
present invention can be used to isolate a large number of 

1 5 polymorphic sites, they can be used to create a map having any 

desired degree of resolution. 

The sequencing of the polymorphic sites greatly increases 
their utility in gene mapping. Such sequences can be used to 
design oligonucleotide primers and probes that can be employed 
20 to "walk" down the chromosome and thereby identify new marker 
sites (Bender, W. et al. . J. Supra. Molec. Struc. 10(suppU :32 
(1979); Chinault, A.C. et aL . Gene 5:111-126 (1979); Clarke, L et 
aL Nature 287 :504-509 (1980)). 

The resolution of the map can be further increased by 

2 5 combining polymorphic analyses with data on the phenotype of 

other attributes of the plant or animal whose genome is being 
mapped. Thus, if a particular polymorphism segregates with 
brown hair color, then that polymorphism maps to a locus near 
the gene or genes that are responsible for hair color. Similarly, 

30 biochemical data can be used to increase the resolution of the 
genetic map. In this embodiment, a biochemical determination 
(such as a serotype, isoform, etc.) is studied in order to 
determine whether it co-segregates with any polymorphic site. 
Such maps can be used to identify new gene sequences, to 

35 identify the causal mutations of disease, for example. 

Indeed, the identification of the SNPs of the present 
invention permits one to use complimentary oligonucleotides as 
primers in PCR or other reactions to isolate and sequence novel 
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gene sequences located on either side of the SNP. The invention 
includes such novel gene sequences. The genomic sequences that 
can be clonally isolated through the use of such primers can be 
transcribed into RNA, and expressed as protein. The present 
5 invention also includes such protein, as well as antibodies and 
other binding molecules capable of binding to such protein. 

The invention is illustrated below with respect to two of 
its embodiments horses and humans. However, because the 
fundamental tenets of genetics apply irrespective of species, 

1 0 such illustration is equally applicable to any other species. 
Those of ordinary skill would therefore need only to directly 
employ the methods of the above invention to isolate SNPs in any 
other species, and to thereby conduct the genetic analysis of the 
present invention. 

1 5 As indicated above, LOD scoring methodology has been 

developed to permit the use of RFLPs to both track the 
inheritance of genetic traits, and to construct a genetic map of a 
species (Lander, S. et aL Proc. Natl. Acad. Sci. (U.S.A.) 83:7353- 
7357 (1986); Lander, S. et aL . Proc. Natl. Acad. Sci. (U.S.A.) 

20 84:2363-2367 (1987); Donis-Keller, H. et aL . Cell 51:319-337 
(1987); Lander, S. et aL . Genetics 121:185-199 (1989)). Such 
methods can be readily adapted to permit their use with the 
polymorphisms of the present invention. Indeed, such 
polymorphisms are superior to RFLPs and STRs in this regard. 

25 Due to the frequency of SNPs, it is possible to readily generate a 
dense genetic map. Moreover, as indicated above, the 
polymorphisms of the present invention are more stable than 
typical (VNTR-type) RFLP polymorphisms. 

The polymorphisms of the present invention comprise 

30 direct genomic sequence information and can therefore be typed 
by a number of methods. In an RFLP or STR-dependent map, the 
analysis must be gel-based, and entail obtaining an 
electrophoretic profile of the DNA of the target animal. In 
contrast, an analysis of the polymorphisms (SNPs) of the present 

35 invention may be performed using spectrophotometric methods, 
and can readily be automated to facilitate the analysis of large 
numbers of target animals. 
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Having now generally described the invention, the same 
will be more readily understood through reference to the 
following examples of the isolation and analysis of equine 
polymorphisms which are provided by way of illustration, and 
5 are not intended to be limiting of the present invention. 

EXAMPLE 1 

DISCOVERY OF EQUINE POLYMORPHISMS 

10 As an initial step in the identification of equine 

polymorphisms, small shotgun libraries were prepared from 
genomic DNA isolated from peripheral blood leukocytes which 
had been purified on a Ficoll-hypaque density gradient from the 
blood of a single, 15 year old thoroughbred gelding (John Henry). 

1 5 This DNA was simultaneously digested to completion with Bam 
HI and Pst I and either used directly or after size fractionation 
on agarose gels. 

Vector pLT14 (a variant of the Stratagene plasmid 
pKSM13(-)) was digested with Bam HI and Pst I and linearized 

20 DNA was purified from an agarose gel. For both vector and size- 
fractionated genomic DNA, agarose plugs were solubilized in 
saturated sodium iodide and the DNA was subsequently 
immobilized on glass powder. After washing, the DNA was 
eluted with water and ethanol precipitated with glycogen 

25 carrier. 

Ligations with varying vector/insert ratios were 
effectuated with T4 DNA ligase at 4°C. E. coli strain XLI was 
transformed with ligation mixtures and plated on LB agar 
containing 100 g/ml ampicillin. Approximately 50,000 clones 

30 were generated in several different experiments using size 
fractionated or unfractionated insert DNA. Unplated 
transformed cells were stored at -70°C in 7% DMSO. Colonies 
were streaked for isolation and small scale plasmid 
preparations were performed to determine the size of inserted 

35 equine DNA. Larger scale preparations were performed with 
Qiagen chromatography. 

The sequence of the first 200-300 nucleotides of the 
genomic insert was determined by the chain terminating 
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dideoxynucleoside method with T7 DNA polymerase from primers 
complementary to plasmid sequences. This information was 
used to design synthetic oligonucleotide primers complementary 
to the equine sequence to be employed in PCR reactions. 
5 In most cases, two sets of PCR primers (generally 25- 

mers) were synthesized. The first set was used to amplify, 
under a standardized set of conditions, from genomic DNA. The 
products of these reactions were diluted and used as template 
DNA in a second PCR using nested primers slightly internal to 

1 0 the original set. The products of these two reactions were 
compared to those obtained using the original plasmid DNA as 
template. In most cases, it was possible to obtain high quality, 
single-species products using this procedure with no attempt to 
optimize reaction conditions for any particular pair of primers. 

1 5 Two different methods were used to screen amplified DNA 

from horses for polymorphic sequences. Initially, PCR 
fragments from a panel of 6 horses were digested with a panel 
of restriction endonucleases having 4 base recognition sites. 
The products of these reactions were analyzed by acrylamide gel 

20 electrophoresis on 5% - 7.5% non-denaturing gels. Digestion 
products which showed variability when hybridized to different 
members of the panel were subjected to DNA sequence analysis. 
Later, DNA sequencing was used directly to screen for 
polymorphic sites. The PCR fragments from five unrelated 

25 horses were electroeluted from acrylamide gels and sequenced 
using repetitive cycles of thermostable Taq polymerase reaction 
in the presence of a mixture of dNTPs and fluorescent ddNTPs. 
The products were then separated and analyzed using the 
automated DNA sequencing instrument of Applied Biosystems, 

30 Inc. The data was analyzed using ABI software. Differences 
between sequences of different animals were identified by the 
software and confirmed by inspecting the relevant portion of the 
chromatograms on the computer screen. Differences were 
concluded to be a DNA polymorphism only if the data was 

35 available for both strands, and/or present in more than one 
haploid example among the five horses tested. 
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EXAMPLE 2 

CHARACTERIZATION OF EQUINE POLYMORPHISMS 

The. program of identification and characterization of 
polymorphic DNA sequences in randomly selected fragments was 
5 continued such that approximately 550 plasmids have been 
characterized to this level. The sequences adjacent to the 
cloning sites was determined for 200 of these plasmids. Inserts 
of these sequenced plasmids ranged in size from 0.25 to 3.5 kb. 
Using this sequence information, oligonucleotide primers were 

10 designed to enable PCR amplification of the same genomic region 
from different horses. 

In order to identify the nucleotides present at polymorphic 
sites, PCR fragments from 5 horses were purified from 
acrylamide gels by electroelution and completely sequenced 

1 5 using Taq polymerase "Cycle" sequencing biochemistry and 
automated sequencing equipment. Results from the 5 horses 
were analyzed by computer and visually confirmed. DNA 
sequence variants discovered by this method were scored only if 
the sequence was obtained on both strands and the variant 

20 sequence had been found in more than one haploid example. The 
18 clones of Table 1 comprise a subset of identified SNPs. In 
Table 1, the immediately 5'-proximal sequence, the identity of 
the nucleotide of the polymorphic site, and the immediately 3'- 
distal sequence of each SNP is presented. For each SNP, Such 

25 sequences are shown in the horizontal rows. The sequences of 
double-stranded DNA in Table 1 is presented in compliance with 
the Sequence Listing requirements of the United States Patent 
and Trademark Office. Thus, all sequences are presented in the 
same orientation (5'-»3'). The organization of the Table is 

30 illustrated in Figure 6 with respect to an illustrative SNP, clone 
177-2. This SNP has a polymorphic site capable of having either 
a C or a T in one strand, and a G or A in the opposite strand. The 
5'-proximal DNA sequence that immediately precedes the 
polymorphic site in the C/T strand is designated as SEQ ID NO:1. 

35 The 3'-distal sequence that immediately follows the 
polymorphic site in the C/T strand is designated as SEQ ID NO:2. 
The 5'-proximal DNA sequence that immediately precedes the 
polymorphic site in the G/A strand is designated as SEQ ID NO:3. 
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The 3'-distal sequence that immediately follows the 
polymorphic site in the G/A strand is designated as SEQ ID NO:4. 
Bearing in mind that the sequences are written in the same 
orientation (5'-> 3"), it will be seen that the sequences of SEQ ID 
5 NO:1 and SEQ ID NO:4 are complimentary; similarly, the 
sequences of SEQ ID NO:2 and SEQ ID NO:3 are complimentary. The 
sequences that flank a particular polymorphic site are thus 
obtained by combining the proximal sequence of one row with 
the distal sequence also shown in the same row. 
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The present specification refers to the above sequences by 
their sequence ID numbers (i.e. SEQ ID NO). To facilitate such 
disclosure, algebraic notation (such as "2n+1") is employed, in 
accordance with conventional algebra. Thus, the designation 
5 "SEQ ID NO:(2n+1)" denotes SEQ ID NO:5 where n=2, and SEQ ID 
NO:7 where n=3, etc. 

EXAMPLE 3 

ALLELIC FREQUENCY ANALYSIS OF EQUINE POLYMORPHISMS IN 
1 o SMALL POPULATION STUDIES 

Small population studies (50 - 60 animals) of these DNA 
sequence polymorphisms has been carried out on a number of 
these polymorphic sites using Genetic Bit Analysis (GBA), the 
1 5 preferred solid-phase, single nucleotide interrogation system 
(Goelet, P. et al . (WO 92/15712). The 7 steps of the most 
preferred embodiment is illustrated in Figure 7: 

Step 1: DNA preparation. 

Step 2: Amplification of Target Sequence. After DNA is 
20 prepared from the sample, a specific region of the sample 
genome (locus) is amplified using the PCR. One of the PCR 
primers is modified with four phosphorothioate linkages at the 
5'-end. 

Step 3: Exonuclease Digestion and the Generation of 
25 Single-Stranded Template. The PCR product is digested with 
exonuclease, leaving the phosphorothioated strand intact. 

Step 4: Hybridization to Capture the Amplified Template. 
The template strand is next hybridized to the appropriate GBA 
primer that is immobilized on the surface of a microtiter well. 
30 Step 5: Single Base Extension with Polymerase. DNA 

polymerase and haptenated ddNTPs are used to extend the GBA 
primer by one base in a template-dependent manner. 

Step 6: Colorimetric detection of the Extension Product. 
After the template is washed away using NaOH, the haptenated 
35 base is detected using an anti-hapten conjugate and the 
appropriate colorimetric substrate. 

Step 6: Computer-Assisted Interpretation of Genotype. The 
colorimetric data from a number of loci is converted to an SNP 
genotype for the particular individual tested. 
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The method is preferably conducted in the following 
manner: 

GBA Template Preparation. 

5 Amplification of genomic sequences was performed using 

the polymerase chain reaction (PCR). In a first step, one hundred 
nanograms of genomic DNA was used in a reaction mixture 
containing each first round primer at a concentration of 2 M and 
10 mM Tris pH 8.3, 50 mM KCI, 1.5 mM MgCI 2 , 0.01% gelatin; and 

1 0 0.05 units per I Taq DNA Polymerase (AmpliTaq, Perkin Elmer). 

To obtain single-stranded template for use with solid- 
phase immobilized primer, either of two methods may be used. 
First, the amplification may be mediated using primers that 
contain 4 posphorothioate-nucleotide derivatives, as taught by 

15 Nikiforov, T. (U.S. patent application serial no. 08/005,061). 
Alternatively, a second round of PCR may be performed using 
"asymmetric" primer concentrations. The products of the first 
reaction are diluted 1/1000 in a second reaction. One of the 
second round primers is used at the standard concentration of 2 

20 M while the other is used at 0.08 M. Under these conditions, 
single stranded molecules are synthesized during the reaction. 

Solid phase immobilization of nucleic acids. 

For the GBA procedure, solid-phase attachment of the 
25 template-primer complex simplifies washes, buffer exchanges, 
etc., and in principle this attachment can be either via the 
template or the primer. In practice, however, especially when 
non gel-based detection methods are employed, attachment via 
the primer is preferable. This format allows the use of 
30 stringent washes (e.g., 0.2 N NaOH) to remove impurities and 
reaction side products while retaining the haptenated 
dideoxynucleotide covalently linked to the 3'-end of the primer. 

Therefore, for GBA reactions in 96-well plates (Nunc 
Nunclon plates, Roskilde, Denmark), the GBA primer was 
3 5 covalently coupled to the plate. This was accomplished by 
incubating 10 pmoles of primer having a S^amino group per well 
in 50 of 3 mM sodium phosphate buffer, pH 6, 20 mM 1-ethyl-3- 
(3-dimethylaminopropyl)-carbodiimide (EDC) overnight at room 
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temperature. After coupling, the plate was washed three times 
with TNTw. 

GBA in Microwell Plates. 

5 Hybridization of single-stranded DNA to primers 

covalently coupled to 96-well plates was accomplished by 
adding an equal volume of 3 M NaCI, 20 mM EDTA to the single- 
stranded PCR product and incubating each well with 20 I of this 
mixture at 20° C for 30 minutes. The plate was subsequently 

1 0 washed three times with TNTw. Twenty I of polymerase 
extension mix containing ddNTPs (3 M each, one of which was 
biotinylated, 5 mM DTT, 7.5 mM sodium isocitrate, 5 mM MnCI 2 , 
0.04 units per I of Klenow DNA polymerase and incubated for 5 
minutes at room temperature. 

1 5 Following the extension reaction, the plate was washed 

once with TNTw. Template strands were removed by incubating 
wells with 50 ul of 0.2 N NaOH for 5 minutes at room 
temperature, then washing the well with another 50 \i\ of 0.2 N 
NaOH. The plate was then washed three times with TNTw. 

20 Incorporation of biotinylated ddNTPs was measured by an 
enzyme-linked assay. Each well was incubated with 20 \i\ of 
streptavidin-conjugated horseradish peroxidase (1/1000 
dilution in TNTw of product purchased from BRL, Gaithersburg, 
MD) with agitation for 30 minutes at room temperature. After 

25 washing 5 times with TNTw, 100 ul of o-phenylenediamine (OPD, 
1 mg/ml in 0.1 M citric acid, pH 4.5) (BRL) containing 0.012% 
H 2 0 2 was added to each well. The amount of bound enzyme was 
determined kinetically with a Molecular Devices model "Vmax" 
96-well spectrophotometer. Figures 8A and 8B illustrate how 

30 horse parentage data appears at the microtiter plate level. In 
standard horse parentage testing, samples are arrayed 85 to a 
plate (columns 1-11) plus controls (column 12). For each horse 
locus the presence of the two known alleles is determined by 
base specific interrogation on separate plates. The two plates 

35 shown in figures 8A and 8B are identical in PCR template and 
GBA primer and differ only in the biotinylated ddNTP that was 
used in the extension reaction (biotin-ddCTP in Figure 8A and 
biotin-ddTTP in Figure 8B). Upon addition of the colorimetric 
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reagent (OPD), the absorbance of the resultant color was 
measured in a Molecular Devices microtiter plate reader and the 
raw data generated in milliOD/min per well. The two raw data 
gray scale representations of the absorbance data for these 
5 plates are shown in the figures arranged in the exact same order 
as on the microtiter plates. Gray scale intensity correlates 
directly with color production. At this biallelic locus the bases 
detected are C (Figure 8A) and T (Figure 8B). Approximately 40% 
of horses tested to date are heterozygotes (the sample in well 

10 A1, for example) and the remaining homozygous for C (A2, for 
example) or T (B3, for example). Synthetic template controls 
Include a control C homozygote (well E12), a control T 
homozygots (well F12) and a control heterozygote (well G12). 
Scale refers to milliOD/min at 450 nm. Most positive samples 

15 had signals above 100 in this case. In this format, for a 28 
biallelic marker panel horse parentage test, 56 such plates 
would be required for complete typing of the 85 horses. 

Fifty-one random, unrelated horses and three 
sire/dam/foal families were chosen for study in order to 

20 establish that a reasonable subset of the group of DNA markers 
found to date was likely to provide the desired p(exc) > 0.90, and 
to assess the power of the DNA markers thereby allowing them 
to be prioritized for definitive allelic frequency measurements. 
PCR generated single-stranded template DNA was prepared 

25 from the genomic DNA of each animal. This material was typed 
with respect to nucleotide variants using GBA. The genotype 
data obtained for each polymorphic site is summarized in Table 
2. From this genotype data, allelic frequencies were determined 
and used to calculate the p(exc) of each site. The cumulative 

30 p(exc) is given for the group of 18 sites listed in Tables 1 and 2 
is 0.955 for the group. In Tables 2-5, the genotype is indicated 
as either homozygote (i.e. PP or QQ) or the heterozygote (PQ). 
The numbers in parentheses denote the number of alleles of the 
genotype observed. 
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EXAMPLE 4 

PARENTAGE TESTING 

A family consisting of a sire, dam and offspring was typed 
5 with respect to the 18 variable sites discussed above with no 
exclusions found. This family had not been previously blood 
typed. Using the preliminary allelic frequency numbers given in 
Table 2, it is possible to construct a p(exc) table pertaining to 
this specific case (Table 3). In general, this Table is 

1 0 constructed assuming that the identity of the dam is not in 
question (although in practice, it is possible to exclude the mare 
if neither of her alleles is inherited by the foal). Table 3 shows 
the typing data for the foal and its dam with the sites tested 
listed in order of informativeness in this case. The overall cum 

15 p(exc) using 18 loci was 0.942. 



TABI 


-E3 






LOCUS 


FOAL 


DAM 


EXCL'DED 
SIRES 


p(exc) 


p(non- 
exc) 


cum p(non- 
cxc) 


cum p(exc) 


459-I 


AC 


CC 


AA 


0.524 


0.476 | 


0.476 


0.524 


1 29- 1 


AA 


AT 


TT 


0.370 


0.630 


0.300 


0.700 


324-1 


CC 


CT 


TT 


0.321 


0.679 


0.204 


0.796 ! 


595-3 


GG 


GG 


AA 


0.279 


0.721 


0.147 


0.853 


090-2 


GG 


AG 


AA 


0.217 


0.783 


0.115 


0.885 


324-2 


CC 


CT 


TT 


0.I5I 


0.849 


0.098 


0.902 


595-1 


AA 1 


AA 


GG 


0.092 


0.818 


0.080 


0.920 


007-3 


AA 


AA 


GG 


0.080 


0.920 


0.073 


0.927 


085-1 


CC 


CC 


GG 


0.071 


0.929 


0.068 


0.932 


474-1 


AA 


AA 


TT 


0.059 


0.941 


0.064 


0.936 


178-1 


AA 


AG 


GG 


0.043 


0.957 


0.061 


0.939 1 


595-2 


GG 


GG 


TT 


0.036 


0.964 


0.059 


0.941 


177-1 


CC 


CC 


AA 


0.018 


0.982 


0.058 


0.942 


459-2 


CC 


CC j 


GG 


0.003 


0.997 


0.058 


0.942 


007-1 


CG 


CG 




0.000 


1.000 


0.058 


0.942 


007-2 


AG 


AG 




0.000 


1.000 


0.058 


0.942 


177-2 


CT 


CT 




0.000 


1.000 


0.058 


0.942 


177-3 


AG 


AG 




0.000 


1.000 


0.058 


0.942 
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EXAMPLE 5 

IDENTITY TESTING 

It is of interest to make use of the population analysis 
5 group to derive preliminary information concerning other 
aspects of the marker panel. For example, using the allelic 
frequency data, it is possible to calculate a probability of 
identity [p(ID)] value for the 18 sites which is equal to 4.79 x 
10' 7 or approximately 1 in 2.1 million. Thus, one would predict 

1 0 that none of the horses examined in the population group would 
have the same genotype and computer analysis of the genotype 
database revealed this to be the case. As shown in Table 4, the 
p(ID) reaches very small numbers with analysis of comparatively 
few loci. Using the top seven sites, the probability of two 

1 5 random animals having different genotypes is already 99.9%. 



I TABLE 4 








LOCUS 


GENOTYPE 
1 

PP (#) 


GENOTYPE 
2 

PO (#) 


GENOTYPE 
3 

00 (#) 


P 


q 


p(ID) 


cum 
P(ID) 


177-2 


CC (18) 


CT (23) 


TT (18) 


0.500 


0.500 


0.375 


0.375 


595-3 


AA (14) 


AG (28) 


GG (11) 


0.528 


0.472 


0.376 


0.141 


090-2 


AA (13) 


AG (28) 


GG (17) 


0.466 


0.534 


0.376 


0.053 


324-] 


CC (11) 


CT (30) 


TT (19) 


0.433 


0.567 


0.380 


0.020 


129-1 


AA ( 7) 


AT (33) 


TT (20) 


0.392 


0.608 


0.388 


0.008 


007-1 


AA (22) 


CG (29) 


GG ( 9) 


0.608 


0.392 


0.388 


0.003 


324-2 


CC (21) 


CT (24) 


TT ( 9) 


0.611 


0.389 


0.388 


0.001 


177-3 


AA (26) 


AG (25) 


GG ( 9) 


0.642 


0.358 


0.397 


4.67xl0" 4 


595-1 


AA (25) 


AG (21) 


GG ( 5) 


0.696 


0.304 


0.422 


1.97xl0- 4 


007-3 


AA (27) 


AG (32) 


GG ( 1) 


0.717 


0.283 


0.435 


8.57xl0" 4 


459-1 


AA ( 5) 


AC (22) 


CC (31) 


0.276 


0.724 


0.440 


3.77xl0 -5 


085-1 


CC (32) 


CG (24) 


GG ( 4) 


0.733 


0.267 


0.447 


1.68xl0" 5 


007-2 


AA ( 3) 


AG (25) 


GG (31) 


0.263 


0.737 


0.450 


7.58X10' 6 


474- 1 


AA (35) 


AT (21) 


TT ( 4) 


0.758 


0.242 


0.468 


3.55xl0" 6 


178-1 


AA (38) 


AG (16) 


GG ( 4) 


0.793 


0.207 


0.505 


1.79xl0* 6 


595-2 


GG (34) 


GT (13) 


TT ( 3) 


0.810 


0.190 


0.527 


9.45xl0" 7 


177-1 


AA ( 2) 


J AC (12) 


CC (46) 


0.133 


0.867 


0.618 


5.84xl0* 7 


1 459-2 


CC (53J 


CG ( 6) 


GG ( 0) 


0.949 


0.051 


0.821 


4.79xl0' 7 



20 False Report Rate 

In the current study, two types of potential false reports 
can be encountered due to either (1) PCR failures or 
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(2) incompatibility between the genotype obtained on opposite 
strands. Only data from those animals which had been 
successfully typed in both strands was included in the allelic 
frequency calculations. Sixty horses typed with respect to 18 
5 sites amounts to 1,080 genotypings. 95% of all typing 
experiments were successful overall. No typing errors were due 
to traditional PCR failures. 3.8% false reports were encountered 
at the GBA step either because the PCR was unsuccessful at the 
single strand step or due to operator error. 1.1% of all typings 
1 0 produced incompatible data between the strands for unknown 
reasons. 

In sum, the GBA (genetic bit analysis) method is thus a 
simple, convenient, and automatable method for interrogating 
SNPs. In this method, sequence-specific annealing to a solid 

1 5 phase-bound primer is used to select a unique polymorphic site 
in a nucleic acid sample, and interrogation of this site is via a 
highly accurate DNA polymerase reaction using a set of novel 
non-radioactive dideoxynucleotide analogs. One of the most 
attractive features of the GBA approach is that, because the 

20 actual allelic discrimination is carried out by the DNA 
polymerase, one set of reaction conditions can be used to 
interrogate many different polymorphic loci. This feature 
permits cost reductions in complex DNA tests by exploitation of 
parallel formats and provides for rapid development of new 

25 tests. 

The intrinsic error rate of the GBA procedure in its present 
format is believed to be low; the signal-to-noise ratio in terms 
of correct vs. incorrect nucleotide incorporation for 
homozygotes appears to be approximately 20:1. GBA is thus 

30 sufficiently quantitative to allow the reliable detection of 
heterozygotes in genotyping studies. The presence in the DNA 
polymerase-mediated extension reaction of all four 
dideoxynucleoside triphosphates as the sole nucleotide 
substrates heightens the fidelity of genotype determinations by 

35 suppressing misincorporation. GBA can be used in any 
application where point mutation analyses are presently 
employed — including genetic mapping and linkage studies, 
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genetic diagnoses, and identity/paternity testing -- assuming 
that the surrounding DNA sequence is known. 

EXAMPLE 6 

ANALYSIS OF A HUMAN SNP 

5 Human single nucleotide polymorphisms may be used in the 

same manner as the above-described equine polymorphisms. 
Examples of suitable human polymorphisms are presented in 
Table 5. 
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For the purpose of validating the strategy of converting 
human SNPs to a GBA test format, a phenotypically neutral SNP 
site was converted and tested by GBA. This site was selected 
from the Johns Hopkins University OMB database of human 
5 polymorphisms. The site is met-H on chromosome 7 at q31, 
mutation position 127, A to G (Horn, G.T. et ai, Clin. Chem. 36, 
1614-1619, 1990). The following oligonucleotides were 
synthesized (p=phosphorothioate): 

1 0 PCR primer no. 1552 (SEQ ID NO:93) 

5-CpApTpCpCATGTAGGAGAGCCTTAGTC 

PCR primer no. 1553 (SEQ ID NO:94) 

5'-CCAI I I I I GTGTCTTCTAGTCTAAGG 

1 5 

GBA primer no. 1554 (SEQ ID NO:95) 

5'-TTGAAAGATCGTCAGAAAAATCC 

Human DNA samples were randomly selected from the DNA 
20 archives of two families available from the Centre D'Etude du 
Polymorphisme Humaine (CEPH) family collection. A negative 
control, containing no DNA was also used. Sample DNAs were 
amplified by PCR using the above primers and the resulting 
product was analyzed by GBA for two potential bases at the 
25 polymorphic site, G and A. GBA results were obtained by an 
endpoint reading of absorbance at 450 nm in a microtiter plate 
reader. The data is presented in Table 6. 

Samples 1, 2, 4, 6 and 8 were homozygous for A, samples 7 
and 9 were homozygous for G and samples 3 and 5 were GA 
30 heterozygotes. These DNAs have not been tested for this 
biallelism by any other method to date. 
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| TABLE 6 




Sample 
No. 


CEPH DNA 

No. 


Adsorption at 

A450 
Base Base 
G A 


Genotype 


1 


1 ooo- I U 


.100 


.556 


AA 


d. 




.084 


.782 


AA 


<5 
O 




.372 


.369 


GA 


4 


1333-05 


.081 


.905 


AA 


5 


1333-07 


.321 


.346 


GA 


6 


1333-08 


.084 


.803 


AA 


7 


1340-09 


.675 


.092 


G3 


8 


1340-10 


.084 


.756 


AA 


9 


1340-12 


.537 


.096 


G3 | 


No DNA 


N/A 


.076 


.097 


N/A 



False Report Rate 

In the current study, two types of potential false reports 
5 can be encountered due to either (1) PCR failures or 
(2) incompatibility between the genotype obtained on opposite 
strands. Only data from those animals which had been 
successfully typed in both strands was included in the allelic 
frequency calculations. Sixty horses typed with respect to 18 

10 sites amounts to 1,080 genotypings. 95% of all typing 
experiments were successful overall. No typing errors were due 
to traditional PCR failures. 3.8% false reports were encountered 
at the GBA step either because the PCR was unsuccessful at the 
single strand step or due to operator error. 1.1% of all typings 

1 5 produced incompatible data between the strands for unknown 
reasons. 

In sum, the GBA (genetic bit analysis) method is a simple, 
convenient, and automatable method for interrogating SNPs. In 
this method, sequence-specific annealing to a solid phase-bound 
20 primer is used to select a unique polymorphic site in a nucleic 
acid sample, and interrogation of this site is via a highly 
accurate DNA polymerase reaction using a set of novel non- 
radioactive dideoxynucleotide analogs. One of the most 
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attractive features of the GBA approach is that, because the 
actual allelic discrimination is carried out by the DNA 
polymerase, one set of reaction conditions can be used to 
interrogate many different polymorphic loci. This feature 
5 permits cost reductions in complex DNA tests by exploitation of 
parallel formats and provides for rapid development of new 
tests. 

The intrinsic error rate of the GBA procedure in its present 
format is believed to be low; the signal-to-noise ratio in terms 

10 of correct vs. incorrect nucleotide incorporation for 
homozygotes appears to be approximately 20:1. GBA is thus 
sufficiently quantitative to allow the reliable detection of 
heterozygotes in genotyping studies. The presence in the DNA 
polymerase-mediated extension reaction of all four 

15 dideoxynucleoside triphosphates as the sole nucleotide 
substrates heightens the fidelity of genotype determinations by 
suppressing misincorporation. GBA can be used in any 
application where point mutation analyses are presently 
employed - including genetic mapping and linkage studies, 

20 genetic diagnoses, and identity/paternity testing -- assuming 
that the local surrounding DNA sequence is known. 

While the invention has been described in connection with 
specific embodiments thereof, it will be understood that it is 
capable of further modifications and this application is intended 

25 to cover any variations, uses, or adaptations of the invention 
following, in general, the principles of the invention and 
including such departures from the present disclosure as come 
within known or customary practice within the art to which the 
invention pertains and as may be applied to the essential 

30 features hereinbefore set forth and as follows in the scope of 
the appended claims. 
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10 



40 



50 



SEQUENCE LISTING 

(1) GENERAL INFORMATION: 

(j) APPLICANT: MOLECULAR TOOL, INC. 

(ii) TITLE OF INVENTION: SINGLE NUCLEOTIDE POLYMORPHISMS AND 

THEIR USE IN GENETIC ANALYSIS 

(iii) NUMBER OF SEQUENCES: 95 



(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: HOWREY & SIMON 

1 5 (B) STREET: 1299 PENNSYLVANIA AVENUE, N.W. 

(C) CITY: WASHINGTON 

(D) STATE: D.C. 
(E) COUNTRY: US 

(F) ZIP: 20004 

20 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

2 5 (D) SOFTWARE: Patentln Release #1.0, Version #1.25 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: US 

(B) FILING DATE: 

3 0 (C) CLASSIFICATION: 

(viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: AUERBACH, JEFFREY I 

(B) REGISTRATION NUMBER: 32,680 

3 5 (C) REFERENCE/DOCKET NUMBER: 683-1 04-CIP-PCT 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (202) 383-7451 

(B) TELEFAX: (202) 383-6610 



(2) INFORMATION FOR SEQ ID NO:1: 



(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 20 base pairs 
4 5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 



(iv) ANTI-SENSE: NO 
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(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vli) IMMEDIATE SOURCE: 
5 (B) CLONE: 177-2 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:1: 

GCAGCTCTAA GTGCTGTGGG 20 

10 

(2) INFORMATION FOR SEQ ID NO:2: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 20 base pairs 

1 5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(il) MOLECULE TYPE: DNA (genomic) 

20 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

2 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 177-2 

30 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:2: 
TGCAGAAATT CTAAGGTGTT 20 

3 5 (2) INFORMATION FOR SEQ ID NO:3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

4 0 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

4 5 (iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

5 0 (A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 177-2 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:3: 
AACACCTTAG AATTTCTGCA 20 
5 (2) INFORMATION FOR SEQ ID NO:4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

1 0 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

1 5 (HI) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

2 0 (A) ORGANISM: Equus caballus 

(vli) IMMEDIATE SOURCE: 
(B) CLONE: 177-2 

2 5 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:4: 

CCCACAGCAC TTAGAGCTGC 20 
(2) INFORMATION FOR SEQ ID NO:5: 

30 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

3 5 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

40 

(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

45 

(vli) IMMEDIATE SOURCE: 

(B) CLONE: 595-3 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:5: 

50 

AGCTCTGGGA TGATCCACTA 20 
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(2) INFORMATION FOR SEQ ID NO:6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

10 

(Hi) HYPOTHETICAL: NO 
(Iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 595-3 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:6: 
TGAGGGAAAA ATGATGATGC 20 

2 5 (2) INFORMATION FOR SEQ ID NO:7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

3 0 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

3 5 (iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

4 0 (A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 595-3 

4 5 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:7: 



GCATCATCAT TTTTCCCTCA 



20 



WO 95/12607 



- 67 - 



PCT/US94/12632 



(2) INFORMATION FOR SEQ ID NO:8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

10 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus cabailus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 595-3 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:8: 
TAGTGGATCA TCCCAGAGCT 20 

25 

(2) INFORMATION FOR SEQ ID NO:9: 

(I) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 20 base pairs 

3 0 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

35 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

4 0 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus cabailus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 090-2 

45 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:9: 



AAAACTAATT TGATGGCCAT 



20 



WO 95/12607 



- 68 - 



PCT/US94/12632 



(2) INFORMATION FOR SEQ ID NO:10: 

(I) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 20 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

10 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vil) IMMEDIATE SOURCE: 

(B) CLONE: 090-2 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:10: 
AAAGTCAGAA CAATGATTGC 20 

25 

(2) INFORMATION FOR SEQ ID NO:11: 

(I) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 20 base pairs 

3 0 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

35 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

4 0 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vfi) IMMEDIATE SOURCE: 

(B) CLONE: 090-2 

45 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:1 1 : 



GCAATCATTG TTCTGACTTT 



20 
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(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

10 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 090-2 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:12: 
ATGGCCATCA AATTAGTTTT 20 

25 

(2) INFORMATION FOR SEQ ID NO:13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

3 0 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

35 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

4 0 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 324-1 

45 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:13: 



CACAAGGCCC AAGAACAGGA 



20 
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(2) INFORMATION FOR SEQ ID NO:14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

10 

(ill) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 324-1 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:14: 
TGAGTTCAGC GAGTGTCAGA 20 

25 

(2) INFORMATION FOR SEQ ID NO:15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

3 0 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

35 

(Hi) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

4 0 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 324-1 

45 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 



TCTGACACTC GCTGAACTCA 



20 
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(2) INFORMATION FOR SEQ ID NO:16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

10 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 324-1 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:16: 
TCCTGTTCTT GGGCCTTGTG 20 

25 

(2) INFORMATION FOR SEQ ID NO:17: 

0) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 20 base pairs 

3 0 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(il) MOLECULE TYPE: DNA (genomic) 

35 

(Hi) HYPOTHETICAL: NO 
(Iv) ANTI-SENSE: NO 

4 0 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 129-1 

45 

(X0 SEQUENCE DESCRIPTION: SEQ ID NO:17: 



TGGGAAAGAC CACATTATTT 



20 
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(2) INFORMATION FOR SEQ ID NO:18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

10 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vil) IMMEDIATE SOURCE: 

(B) CLONE: 129-1 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:18: 
GTTCCCTTTT GTTTCAGACC 20 

25 

(2) INFORMATION FOR SEQ ID NO:19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

3 0 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

35 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

4 0 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 129-1 

45 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:19: 



GGTCTGAAAC AAAAGGGAAC 



20 
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(2) INFORMATION FOR SEQ ID NO:20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

10 

(ill) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

1 5 (vl) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 129-1 

20 

(xl) SEQUENCE DESCRIPTION: SEQ ID NO:20: 
AAATAATGTG GTCTTTCCCA 

25 

(2) INFORMATION FOR SEQ ID NO:21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

3 0 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

35 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

4 0 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 007-1 

45 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:21: 
CATGAGTAAG AAGCATCCGG 
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20 



20 
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(2) INFORMATION FOR SEQ ID NO:22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

10 

(Hi) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 007-1 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:22: 
CCATGGAGTC ATAGATAAGT 20 

25 

(2) INFORMATION FOR SEQ ID NO:23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

3 0 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

35 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

4 0 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 007-1 

45 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:23: 
ACTTATCTAT GACTCCATGG 



20 
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(2) INFORMATION FOR SEQ ID NO:24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

10 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 007-1 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:24: 
CCGGATGCTT CTTACTCATG 20 

25 

(2) INFORMATION FOR SEQ ID NO:25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

3 0 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

35 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

4 0 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 324-2 

45 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:25: 
CCCAAGAACA GGATTGAGTT 



20 
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(2) INFORMATION FOR SEQ ID NO:26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

10 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 324-2 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:26: 
AGCGAGTGTC AGAGTTGTGT 20 

25 

(2) INFORMATION FOR SEQ ID NO:27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

3 0 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

35 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

4 0 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 324-2 

45 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:27: 
ACACAACTCT GACACTCGCT 



20 
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(2) INFORMATION FOR SEQ ID NO:28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

10 

(Hi) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 324-2 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:28: 
AACTCAATCC TGTTCTTGGG 20 

25 

(2) INFORMATION FOR SEQ ID NO:29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

3 0 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

35 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

4 0 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 177-3 

45 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:29: 



AGCAAGAAA TGGGGGGCCTT 



20 
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(2) INFORMATION FOR SEQ ID NO:30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

10 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vil) IMMEDIATE SOURCE: 

(B) CLONE: 177-3 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:30: 
GTCCTACAAT TGCCAGGAAG 20 

25 

(2) INFORMATION FOR SEQ ID NO:31: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 20 base pairs 

3 0 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(II) MOLECULE TYPE: DNA (genomic) 

35 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

4 0 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vil) IMMEDIATE SOURCE: 

(B) CLONE: 177-3 

45 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:31 : 



CTTCCTGGCA ATTGTAGGAC 



20 



WO 95/12607 



- 79 - 



PCT/US94/12632 



(2) INFORMATION FOR SEQ ID NO:32: 

(I) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 20 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

1 0 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 177-3 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:32: 
AAGGCCCCCC ATTTCTTGCT 20 

25 

(2) INFORMATION FOR SEQ ID NO:33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

3 0 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

35 

(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

4 0 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 595-1 

45 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:33: 



GAATATCAAT ATATATATAT 



20 
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(2) INFORMATION FOR SEQ ID NO:34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

10 

(Hi) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 595-1 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:34: 
TGTGTGTGTG TGTATTTGCT 20 

25 

(2) INFORMATION FOR SEQ ID NO:35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

3 0 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

35 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

4 0 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 595-1 

45 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:35: 



AGCAAATACA CACACACACA 



20 
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(2) INFORMATION FOR SEQ ID NO:36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

10 

(iii) HYPOTHETICAL: NO 
(Iv) ANTI-SENSE. NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 595-1 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:36: 
ATATATATAT ATTGATATTC 20 

2 5 (2) INFORMATION FOR SEQ ID NO:37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

3 0 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
3 5 (iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

40 (A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 007-3 

45 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:37: 



GCCATAATTA AGCCTGTATT 



20 
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• 82 



(2) INFORMATION FOR SEQ ID NO:38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

10 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 007-3 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:38: 
GTTTGTTTTA AATTTTGTGA 

25 

(2) INFORMATION FOR SEQ ID NO:39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

3 0 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

35 

(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

4 0 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 007-3 

45 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:39: 
TCACAAAATT TAAAACAAAC 
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20 
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(2) INFORMATION FOR SEQ ID NO:40: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 20 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(il) MOLECULE TYPE: DNA (genomic) 

1 0 

(Hi) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

1 5 (vl) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 007-3 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:40: 
AATACAGGCT TAATTATGGC 20 

25 

(2) INFORMATION FOR SEQ ID NO:41: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

3 0 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

35 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

4 0 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 459-1 

45 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:41 : 



GTGTAGAGTA GTTCAAGGAC 



20 
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(2) INFORMATION FOR SEQ ID NO:42: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

10 

(iii) HYPOTHETICAL: NO 
(Iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vil) IMMEDIATE SOURCE: 

(B) CLONE: 459-1 

20 

(xl) SEQUENCE DESCRIPTION: SEQ ID NO:42: 
ATGTCTTATA CCTCCCTTTT 20 

25 

(2) INFORMATION FOR SEQ ID NO:43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

3 0 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

35 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

4 0 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 459-1 

45 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:43: 



AAAAGGGAGG TATAAGACAT 



20 
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(2) INFORMATION FOR SEQ ID NO:44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

10 

(Hi) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 459-1 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:44: 
GTCCTTGAAC TACTCTACAC 20 

25 

(2) INFORMATION FOR SEQ ID NO:45: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

3 0 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

35 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

4 0 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 085-1 

45 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:45: 



GTGAACGGAG AGCAGGCCTT 



20 
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(2) INFORMATION FOR SEQ ID NO:46: 

(I) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 20 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

10 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 085-1 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:46: 
CCTGCTGAAG CCTCAGACCG 20 

25 

(2) INFORMATION FOR SEQ ID NO:47: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

3 0 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

35 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

4 0 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vli) IMMEDIATE SOURCE: 

(B) CLONE: 085-1 

45 

(xl) SEQUENCE DESCRIPTION: SEQ ID NO:47: 



CGGTCTGAGG CTTCAGCAGG 



20 
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(2) INFORMATION FOR SEQ ID NO:48: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

10 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 085-1 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:48: 
AAGGCCTGCT CTCCGTTCAC 20 

25 

(2) INFORMATION FOR SEQ ID NO:49: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

3 0 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

35 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

4 0 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 007-2 

45 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:49: 
CTGCTCTTTA GACTATGACC 



20 
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(2) INFORMATION FOR SEQ ID NO:50: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

1 0 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vil) IMMEDIATE SOURCE: 

(B) CLONE: 007-2 

20 

(xl) SEQUENCE DESCRIPTION: SEQ ID NO:50: 
TCAACCTTGC ATCATGAGCT 20 

25 

(2) INFORMATION FOR SEQ ID NO:51: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

3 0 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

35 

(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

4 0 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 007-2 

45 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:51: 
AGCTCATGAT GCAAGGTTGA 



20 
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(2) INFORMATION FOR SEQ ID NO:52: 

(I) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 20 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

10 

(ill) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 007-2 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:52: 
GGTCATAGTC TAAAGAGCAG 20 

25 

(2) INFORMATION FOR SEQ ID NO:53: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 
30' (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

35 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

4 0 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 474-1 

45 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:53: 



TTTGAGCTGG GACCTCAGTC 



20 
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(2) INFORMATION FOR SEQ ID NO:54: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

10 

(Hi) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 474-1 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:54: 
TCTCCTGCCT TTAGACTCGA 20 

25 

(2) INFORMATION FOR SEQ ID NO:55: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

3 0 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

35 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

4 0 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 474-1 

45 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:55: 



TCGAGTCTAA AGGCAGGAGA 



20 
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(2) INFORMATION FOR SEQ ID NO:56: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(II) MOLECULE TYPE: DNA (genomic) 

10 

(lil) HYPOTHETICAL: NO 
(Iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 474-1 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:56: 
GACTGAGGTC CCAGCTCAAA 20 

25 

(2) INFORMATION FOR SEQ ID NO:57: 

(I) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 20 base pairs 

3 0 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

35 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

4 0 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 178-1 

45 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:57: 
GAACCTCTGG GCCGTGGATA 



20 



WO 95/12607 



- 92 - 



PCT/US94/12632 



(2) INFORMATION FOR SEQ ID NO:58: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

10 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 178-1 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:58: 
TTGTTCAGAA GCACAGGTGA 20 

25 

(2) INFORMATION FOR SEQ ID NO:59: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

3 0 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

35 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

4 0 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 178-1 

45 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:59: 



TCACCTGTGC TTCTGAACAA 



20 
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(2) INFORMATION FOR SEQ ID NO:60: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 20 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(li) MOLECULE TYPE: DNA (genomic) 

10 

(lil) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

1 5 (vl) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 178-1 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:60: 
TATCCACGGC CCAGAGGTTC 20 

25 

(2) INFORMATION FOR SEQ ID NO:61: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

3 0 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

35 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

4 0 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 595-2 

45 



(xl) SEQUENCE DESCRIPTION: SEQ ID NO:61: 
GTATTTGCTA GCTCTGGGAT 



20 
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(2) INFORMATION FOR SEQ ID NO:62: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

10 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vil) IMMEDIATE SOURCE: 

(B) CLONE: 595-2 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:62: 
ATCCACTAAT GAGGGAAAAA 20 

25 

(2) INFORMATION FOR SEQ ID NO:63: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

3 0 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

35 

(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

4 0 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 595-2 

45 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:63: 



rmrcccTC attagtggat 



20 
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(2) INFORMATION FOR SEQ ID NO:64: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

10 

(ill) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 595-2 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:64: 
ATCCCAGAGC TAGCAAATAC 20 

25 

(2) INFORMATION FOR SEQ ID NO:65: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

3 0 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

35 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

4 0 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 177-1 

45 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:65: 



GAAGTTGTGG GACAGATGTG 



20 
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(2) INFORMATION FOR SEQ ID NO:66: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

10 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 177-1 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:66: 
AGAGATGCAG CTCTAAGTGC 20 

25 

(2) INFORMATION FOR SEQ ID NO:67: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 
30 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

35 

(iil) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

4 0 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 177-1 

45 

()d) SEQUENCE DESCRIPTION: SEQ ID NO:67: 



GCACTTAGAG CTGCATCTCT 



20 
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(2) INFORMATION FOR SEQ ID NO:68: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

10 

(III) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

1 5 (vl) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 177-1 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:68: 
CACATCTGTC CCACAACTTC 20 

25 

(2) INFORMATION FOR SEQ ID NO:69: 

(I) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 20 base pairs 

3 0 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

35 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

4 0 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 459-2 

45 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:69: 
CCATGAGGAA GCCTCCACAA 



20 
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(2) INFORMATION FOR SEQ ID NO:70: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

10 

(Hi) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 459-2 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:70: 
GTCCCAATAG TCTGGGATTC 20 

25 

(2) INFORMATION FOR SEQ ID NO:71: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

3 0 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

35 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

4 0 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus cabalius 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 459-2 

45 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:71: 
GAATCCCAGA CTATTGGGAC 



20 
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(2) INFORMATION FOR SEQ ID NO:72: 

(0 SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 20 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

10 

(iii) HYPOTHETICAL NO 
(iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Equus caballus 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: 459-2 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:72: 
TTGTGGAGGC TTCCTCATGG 

25 

(2) INFORMATION FOR SEQ ID NO:73: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 
30 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

35 

(iii) HYPOTHETICAL NO 

(iv) ANTI-SENSE: NO 

4 0 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: IGKC 2p12 

45 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:73: 
AAAGCAGACT ACGAGAAACA CAAA 

50 
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20 



24 
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(2) INFORMATION FOR SEQ ID NO:74: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 
5 (B) TYPE: nudeic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

10 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: IGKC 2p12 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:74: 
TCTACGCCTG CGAAGTCACC CATC 24 

25 

(2) INFORMATION FOR SEQ ID NO:75: 

(i) SEQUENCE CHARACTERISTICS: 
3 0 (A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

3 5 (ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

40 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(vii) IMMEDIATE SOURCE: 

4 5 (B) CLONE: IGKC 2p12 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:75: 
5 0 GATGGGTGAC TTCGCAGGCG TAGA 24 
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(2) INFORMATION FOR SEQ ID NO:76: 

(I) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 24 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

10 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: IGKC 2p12 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:76: 
TTTGTGTTTC TCGTAGTCTG CTTT 24 

25 

(2) INFORMATION FOR SEQ ID NO:77: 

(i) SEQUENCE CHARACTERISTICS: 
3 0 (A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

3 5 (ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

40 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(vii) IMMEDIATE SOURCE: 

4 5 (B) CLONE: ILIB 2q3-q21 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:77: 
5 0 CTCCTGCAAT TGACAGAGAG CTCC 24 
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(2) INFORMATION FOR SEQ ID NO:78: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

10 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: ILIB 2q3-q21 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:78: 
GAGGCAGAGA ACAGCACCCA AGGT 24 

25 

(2) INFORMATION FOR SEQ ID NO:79: 

(i) SEQUENCE CHARACTERISTICS: 
3 0 (A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

3 5 (ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

40 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(vii) IMMEDIATE SOURCE: 
45 (B) CLONE: ILIB 2q3-q21 



50 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:79: 
ACCTTGGGTG CTGTTCTCTG CCTC 



24 
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(2) INFORMATION FOR SEQ ID NO:80: 
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(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 24 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(il) MOLECULE TYPE: DNA (genomic) 

10 

(III) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: ILIB 2q3-q21 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:80: 
GGAGCTCTCT GTCAATTGCA GGAG 24 

25 

(2) INFORMATION FOR SEQ ID NO:81: 

(I) SEQUENCE CHARACTERISTICS: 
3 0 (A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

3 5 (ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

40 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(vii) IMMEDIATE SOURCE: 

4 5 (B) CLONE: LDLR 1 9p1 3.3 



50 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:81: 
CTCCATCTCA AGCATCGATG TCAA 



24 
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(2) INFORMATION FOR SEQ ID NO:82: 

(I) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 24 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

10 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: LDLR 19p13.3 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:82: 
GGGGGCAACC GGAAGACCAT CTTG 24 

25 

(2) INFORMATION FOR SEQ ID NO:83: 

(i) SEQUENCE CHARACTERISTICS: 
3 0 (A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

3 5 (ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

40 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(vii) IMMEDIATE SOURCE: 
45 (B) CLONE: LDLR 19p13.3 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:83: 
5 0 CAAGATGGTC TTCCGGTTGC CCCC 24 
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(2) INFORMATION FOR SEQ ID NO:84: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

10 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: LDLR 19p13.3 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:84: 
TTGACATCGA TGCTTGAGAT GGAG 24 

25 

(2) INFORMATION FOR SEQ ID NO:85: 

(I) SEQUENCE CHARACTERISTICS: 
3 0 (A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

3 5 (ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

40 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(vii) IMMEDIATE SOURCE: 

4 5 (B) CLONE: MET-H 7q31 



50 



(xl) SEQUENCE DESCRIPTION: SEQ ID NO:85: 
GTTTGGTCTA AGTTGCTGAT TACC 



24 
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(2) INFORMATION FOR SEQ ID NO:86: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

10 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

1 5 (vl) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(vil) IMMEDIATE SOURCE: 

(B) CLONE: MET-H 7q31 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:86: 
GGATTTTTCT GACGATCTTT CAAC 24 

25 

(2) INFORMATION FOR SEQ ID NO:87: 

(i) SEQUENCE CHARACTERISTICS: 
30 (A) LENGTH: 24 base pairs 

(B) TYPE: nucleic add 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

3 5 (ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

40 

(vl) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(vii) IMMEDIATE SOURCE: 

4 5 (B) CLONE: MET-H 7q31 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:87: 



5 0 GTTGAAAGAT CGTCAGAAAA ATCC 



24 
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(2) INFORMATION FOR SEQ ID NO:88: 

(I) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 24 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

10 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(vli) IMMEDIATE SOURCE: 

(B) CLONE: MET-H 7q31 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:88: 
GGTAATCAGC AACTTAGACC AAAC 24 

25 

(2) INFORMATION FOR SEQ ID NO:89: 

(i) SEQUENCE CHARACTERISTICS: 
3 0 (A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

3 5 (ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

40 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(vii) IMMEDIATE SOURCE: 

4 5 (B) CLONE: PROC 2q13-q21 



50 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:89: 
GCTGACAGCG GCCCACTGCA TGGA 



24 
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(2) INFORMATION FOR SEQ ID NO:90: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

10 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

1 5 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: PROC 2q13-q21 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:90: 
GAGTCCAAGA AGCTCCTTGT CAGG 

25 

(2) INFORMATION FOR SEQ ID NO:91: 

(i) SEQUENCE CHARACTERISTICS: 
3 0 (A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

3 5 (ii) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

40 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(vii) IMMEDIATE SOURCE: 

4 5 (B) CLONE: PROC 2q1 3-q21 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:91: 
5 0 CCTGACAAGG AGCTTCTTGG ACTC 
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(2) INFORMATION FOR SEQ ID NO:92: 

(I) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 24 base pairs 
5 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

10 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

1 5 (vl) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(vll) IMMEDIATE SOURCE: 

(B) CLONE: PROC 2q13-q21 

20 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:92: 
TCCATGCAGT GGGCCGCTGT CAGC 24 

2 5 (2) INFORMATION FOR SEQ ID NO:93: 

(I) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

3 0 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

3 5 (ili) HYPOTHETICAL: NO 

(iv) ANTI-SENSE: NO 

(vi) ORIGINAL SOURCE: 

4 0 (A) ORGANISM: Homo sapiens 

(vii) IMMEDIATE SOURCE: 
(B) CLONE: MET-H 7q31 

45 (xi) SEQUENCE DESCRIPTION: SEQ ID NO:93: 



CATCCATGTA GGAGAGCCTT AGTC 



24 
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(2) INFORMATION FOR SEQ ID NO:94: 

(i) SEQUENCE CHARACTERISTICS: 
5 (A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

1 0 (H) MOLECULE TYPE: DNA (genomic) 

(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

15 

(vl) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(vii) IMMEDIATE SOURCE: 

2 0 (B) CLONE: MET-H 7q31 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:94: 
CCATTTTTGT GTCTTCTAGT CTAAGG 26 

25 

(2) INFORMATION FOR SEQ ID NO:95: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

3 0 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

35 

(ill) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

4 0 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(vii) IMMEDIATE SOURCE: 

(B) CLONE: MET-H 7q31 

45 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:95: 



TTGAAAGATC GTCAGAAAAA TCC 
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WHAT IS CLAIMED IS : 

1. A nucleic acid molecule: 

5 (i) having a nucleotide sequence capable of specifically 

hybridizing to the invariant proximal or invariant distal 
nucleotide sequence of a single nucleotide polymorphism, and 
(ii) being used to specifically detect the single nucleotide 
polymorphic site (X) of the single nucleotide polymorphism. 

1 0 

2. The nucleic acid molecule of claim 1, wherein said mammal 
is selected from the group consisting of humans, non-human 
primates, dogs, cats, cattle, sheep, poultry, and horses. 

15 3. The nucleic acid molecule of claim 2, wherein said mammal 
is a horse. 

4. The nucleic acid molecule of claim 3, wherein said molecule 
has a nucleotide sequence selected from the group consisting 

20 of SEQ ID NO:(2n+1), wherein n is an integer selected from 

the group consisting of 0 through 35. 

5. The nucleic acid molecule of claim 3, wherein the sequence 
of said immediately 3'-distal segment includes a sequence 

25 selected from the group consisting of SEQ ID NO:(2n+2), 
wherein n is an integer selected from the group consisting of 
0 through 35. 

6. A nucleic acid molecule having a sequence complementary to 
30 a sequence selected from the group consisting of SEQ ID NO:1 

through SEQ ID NO:72 in Table 1. 

7. A set of at least two of the nucleic acid molecules of claim 
6. 

35 

8. A set of at least two nucleic acid molecules, wherein at 
least one of said nucleic acid molecules has a sequence 
complementary to a sequence selected from the group 
consisting of SEQ ID NO:1 through SEQ ID NO:72. 
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9. A method for determining the extent of genetic similarity 
between DNA of a target horse and DNA of a reference horse, 
which comprises the steps: 
5 A) determining, for a single nucleotide polymorphism of said 

target horse, and for a corresponding single nucleotide 
polymorphism of said reference horse, whether said 
polymorphisms contain the same single nucleotide at 
their respective polymorphic sites; and 
1 0 B) using said comparison to determine the extent of genetic 

similarity between said target horse and said reference 
horse. 



10. The method of claim 9, wherein said polymorphic sites have 
15 (1) an immediately S'-proximal sequence selected from the 

group consisting of SEQ ID NO:(2n+1), and (2) an immediately 
3'-distaI sequence selected from the group consisting of 
SEQ ID NO:(2n+2); wherein n is an integer selected from the 
group consisting of- 0 through 35. 

20 

11. The method of claim 9, wherein in step A, said 
determination is sufficient to establish that said target 
horse and said reference horse are not the same animal. 



25 12. The method of claim 9, wherein in step A, said 
determination is sufficient to establish that said reference 
horse is not a parent of said target horse. 

13. The method of claim 9, wherein in step A, said reference 
30 horse has a trait, and said determination is sufficient to 

establish that said target horse also has said trait. 



14. The method of claim 9, wherein in step A, said reference 
horse has a first and second trait, and said determination is 
35 sufficient to establish a genetic linkage between said 

traits. 
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15. The method of claim 9, wherein in step A, said 
determination is accomplished by a method having the sub- 
steps: 

(a) incubating a sample of nucleic acid containing said 
5 single nucleotide polymorphism of said target horse, or 

said single nucleotide polymorphism of said reference 
horse, in the presence of a nucleic acid primer and at 
least one dideoxynucleotide derivative, under 
conditions sufficient to permit a polymerase mediated, 

1 0 template-dependent extension of said primer, said 

extension causing the incorporation of a single 
dideoxynucleotide to the 3'-terminus of said primer, 
said single dideoxynucleotide being complementary to 
the single nucleotide of the polymorphic site of said 

1 5 polymorphism; 

(b) permitting said template-dependent extension of said 

primer molecule, and said incorporation of said single 
dideoxynucleotide; and 

(c) determining the identity of the nucleotide incorporated 
20 into said polymorphic site, said identified nucleotide 

being complimentary to said nucleotide of said 
polymorphic site. 

16. The method of claim 15, wherein in substep (a), said primer 
25 is immobilized to a solid support, and wherein in sub-step 

(b), said template-dependent extension of said primer is 
conducted on said immobilized primer. 

17. The method of claim 15, wherein, in sub-step (a), said 
30 sample is processed to amplify a nucleic acid containing 

•said polymorphism prior to said incubation. 

18. The method of claim 15, wherein substep (a) additionally 
includes using a non-invasive swab to collect said sample 

3 5 of DNA from said horse. 



19. The method of claim 15, wherein in substep (a), said 
polymerase mediated, template-dependent extension of said 
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primer is conducted in the presence of at least two 
dideoxynucleotide triphosphate derivatives selected from 
the group consisting of ddATP, ddTTP, ddCTP and ddGTP, but 
in the absence of dATP, dTTP, dCTP and dGTP. 

5 

20. A method for determining the probability that a target 
horse will have a particular trait, which comprises the 
steps: 

A) determining the identity of a single nucleotide present 
10 at a polymorphic site of an equine single nucleotide 

polymorphism, and being present in more than 51% of a 
set of reference horses; 

B) determining whether a single nucleotide present at a 
polymorphic site of a corresponding single nucleotide 

1 5 polymorphism of said target horse has the same identity 

as the single nucleotide present at said polymorphic 
site of said 51% of reference horses exhibiting said 
trait; 

C) using said determination of step B to establish the 
20 probability that said target horse will have said 

particular trait. 

21. The method of claim 20, wherein said equine single 
nucleotide polymorphism has (1) an immediately 5*- 

25 proximal sequence selected from the group consisting of 

SEQ ID NO:(2n+1); and (2) an immediately 3'-distal sequence 
selected from the group consisting of SEQ ID NO:(2n+2); 
wherein n is an integer selected from the group consisting 
of 0 through 35. 

30 

22. The method of claim 20, wherein said trait is an equine 
genetic disease. 

23. The method of claim 20, wherein said trait is an equine 
35 condition. 

24. The method of claim 20, wherein said trait is an equine 
characteristic. 
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25. A method for creating a genetic map of unique sequence 
equine polymorphisms which comprises the steps: 
A) identifying at least one pair of inter-breeding reference 
5 horses, wherein each of said pairs of horses is 

characterized by having a first and a second reference 
horse, 

said first reference horse having: 
two alleles (i) and (ii), said alleles each being single 
10 nucleotide polymorphic alleles having a single 

nucleotide polymorphic site; 
said second reference horse having: 
a corresponding allele (i') to said allele (i) of said first 
reference horse, wherein said allele (i') has a single 
1 5 nucleotide polymorphic site, and wherein the single 

nucleotide present at said polymorphic site of said 
allele (i') differs from the single nucleotide present at 
the polymorphic site of said allele (i) of said first 
reference horse, and 
2 0 B) identifying in a progeny of at least one of said pairs of 

inter-breeding reference horses the single nucleotide 
present at a single nucleotide polymorphic site of a 
corresponding allele of said alleles (i) and (i 1 ), and the 
single nucleotide present at a single nucleotide 
25 polymorphic site of a corresponding allele of said 

alleles (ii) and (ii 1 ); and 
C) determining the extent of genetic linkage between said 
alleles (i) and (ii), to thereby create said a genetic map. 

30 26. The method of claim 25, wherein said steps A, B and C are 
repeated at least once in cycle, to thereby create a genetic 
map having more than two polymorphic sites. 

27. The method of claim 25, wherein at least one of said alleles 
35 (i) and (ii) has (1) an immediately S'-proximal sequence 

selected from the group consisting of SEQ ID NO:(2n+1); and 
(2) an immediately 3'-distal sequence selected from the 
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group consisting of SEQ ID NO:(2n+2); wherein n is an 
integer selected from the group consisting of 0 through 35. 

28. A method for predicting whether a target horse will exhibit 
5 a predetermined trait which comprises the steps: 

A) identifying one or more alleles associated with said 
trait, each allele being a single nucleotide polymorphic 
allele having a single nucleotide polymorphic site; 

B) determining for each of said single nucleotide 

1 o polymorphic alleles, a nucleotide present at said alleles 

polymorphic site in a reference horse exhibiting said 
trait, to thereby define a set of single nucleotides at a 
set of polymorphic sites that are present in a reference 
horse exhibiting said trait; 
1 5 C) determining the identity of single nucleotides present 

at corresponding single nucleotide polymorphic alleles 
of said target horse; and 
D) comparing the identity of the single nucleotides present 
at the polymorphic sites of the polymorphisms of said 

2 0 reference animal with the single nucleotides present at 

said corresponding single nucleotide polymorphic alleles 
of said target horse. 



29. The method of claim 28, wherein at least one of said 
2 5 polymorphisms has (1) an immediately 5'-proximal sequence 

selected from the group consisting of SEQ ID NO:(2n+1); and 
(2) an immediately 3'-distal sequence selected from the 
group consisting of SEQ ID NO:(2n+2); wherein n is an 
integer selected from the group consisting of 0 through 35. 

30 

30. A method for identifying a single nucleotide polymorphic 
site which comprises: 

A) isolating a fragment of genomic DNA of a reference 
organism; 

35 B) sequencing said fragment of DNA to thereby determine 

the nucleotide sequence of a segment of said fragment, 
said segment being of a length sufficient to define the 
nucleotide sequence of a pair of oligonucleotide primers 



WO 95/12607 



- 117 - 



PCT/US94/12632 



capable of mediating the specific amplification of said 
fragment; 

C) using said oligonucleotide primers to mediate the 
specific amplification of DNA obtained from a plurality 

5 of other organisms of the same species as said 

reference organism; and 

D) determining the nucleotide sequences of said amplified 
DNA molecules of step C, and comparing the sequence of 
said amplified molecules with the sequence of said 

1 0 fragment of said reference organism to thereby identify 

a single nucleotide polymorphic site. 

31. A method for interrogating a polymorphic region of a 
human single nucleotide polymorphism of a target human, 
1 5 said method comprising: 

A) selecting a known human single nucleotide 
polymorphism for interrogation; 

B) identifying the sequence of at least one oligonucleotide 
that flanks said selected single nucleotide 

20 polymorphism; said identified sequence being of a length 

sufficient to permit the identification of primers 
capable of being used to effect the specific 
amplification of said flanking oligonucleotide and said 
polymorphism; 

25 C) using said primers to effect the amplification of said 

flanking oligonucleotide and said polymorphism of said 
single nucleotide polymorphism of said target human; 
and 

D) interrogating the single nucleotide polymorphism of 
30 said amplified polymorphism by genetic bit analysis. 
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